DOC01EIT BBSOME 



ED 210 317 



TH 620 GOB 



AOTHO* 
TTT1E 

INSTITUTION 

SPONS AGENCY 
POB DATE 
G*ANT 
HOT? 

EDFS PPICE 
DESCRIPTORS 



Anderson, Beverly L. 

Guide to Adult Functional Literacy Assessment Using 
Existing Tests. 

Northwest Regional Educational Lab, , Portland, 
Oreg. 

National Inst, of Education (ED) , Washington, D.C- 

Jun B1 

aOO-80-0105 

113p. 

MF01/PC05 Plus Postage. 

Adult Counseling: *Adult Literacy; Adults: Basic 
Skills: *Functional Literacy: Minimum competencies; 
♦Personnel Evaluation; Reading Tests; *Test 
Selection: Writing Evaluation 



ABSTRACT 

This is a guide designed for professionals 
to assess the literacy ability of adults. The three general 
categories of literacy skills which are distinguished inclu 
literacy skills, everyday skills, and job- related skills, F 
literacy has beer, defined in various ways, therefore, the a 
of it is very difficult. The purpose of literacy assessment 
established before selecting the appropriate instrument. Th 
+o te considered are usability, validity, and reliability, 
decisions to be made before a specific test or assessment a 
are identified relate to the categories of literacy, the pu 
testing, the uses and users of test results, examinee chara 
and logistics. General guidelines for reviewing existing pu 
unpublished tests are examined. These include preliminary s 
ard technical quality review. Assessment of everyday litera 
activities mav be determined through the use of one of seve 
published tests designed for this purpose. Assessment of on 
literacy is usually limited to tests of clerical abilit". A 
to the guide include suggested readings on definitions of f 
literacy, published tests of basic writing and reading skil 
procedures for scoring writing samples. (DWH) 



who need 

de generic 

unctioaal 

ssessment 

must be 
€ criteria 
The 

pproach 

rpose for 

ct eristics 

blished or 

cre^aing 

cy 

ral 

-the- job 
ppendices 
unctional 
Is, and 



*************************************** * *~ ******************* 

* Reproductions supplied by EDRS a:, t • ~* best that can be made * 

* from the original document. * 
****************************************************** ***************** 

O 

ERLC 





G 


in 





UIDE TO ADULT M 
FUNCTIONAL LITERACY 
ASSESSMENT^ 
Using Existing Tests 



CD 
CD 



SCOPE OF INTFRFST NOTICE 



(li lf1 tt ,t M . tr* - 'H 



US Of PART VENT Of EDUCATION 



Functional Literacy Project 



S3 Northwest Regional Educational Laboratory 











ERLC 



GUIDE TO ADULT 
FUNCTIONAL LITERACY 
ASSESSMENT 
Using Existing Tests 

Beverly L. Anderson 
June 1981 



Functional Literacy Project 




Northwest Regional Educational Laboratory 

300 S.W. Sixth Avenue Portland, Oregon 97204 



June 1981 



This work is published by the Northwest Regional Educational Laboratory , 
a private nonprofit corporation. The work contained herein has been 
developed unde grant 400-80-0105 with the National Institute of 
Education (NIE), Department of Education. The opinions expressed in this 
publication do not necessarily reflect the position of the National 
Institute of Education, and no official endorsement by the Institute 
should be inferred. Mention of trade names, commercial products or 
organizations does not imply endorsement by the U.S. Government. 



NIE Project Officer: Monte Penney 



Acknowledgements 



Appreciation is expressed to many people for their contribution to 
this Guide. Stephen Reder and Karen Reed Green of the Functional 
Literacy Project were heavily involved in designing the structure of the 
Guide as well as in reviewing and contributing to early versions. Jean 
Greene conducted interviews with practitioners to help determine what 
content should be included. Sharon Tamura and Nancy Bridgeford assisted 
in identifying and reviewing tests. Barbara Hejtmanek and Laura Hopkins 
provided secretarial support. Ann Adelman and Nikki Sullivan, adult 
educators in Portland, provided helpful input through review of drafts of 
the Guide. And Vicki Spandel once again demonstrated her excellent 
skills in the final editing of the Guide. To all of you, thanks. 

Beverly Anderson 



ERLC 



T&ble of Contents 

Page 



Acknowledgements i 

Table of Contents ii 

Preface vi 

Chapter I. Introduction 1 

Chapter II. Decisions Prior to Test Selection 4 
What General Category of Literacy Assessment is of Concern? 5 

What Are the Purposes for Assessing? 6 

How and By Whom Will Test Results be Used? 8 

What Examinee Characteristics Influence Test Selection? 9 

What Logistical Considerations Are Most Important? 10 

Summary 10 

m 

Chapter III. Test Review Procedures 11 

Preliminary Screening 12 

Technical Quality Review 13 

Chapter IV. After Test Selection 25 

Setting Performance Levels 26 

Interpreting Test Results 33 

Reporting Test Results 33 

9 



ii 

ERIC 



Page 

Chapter V. Basic Literacy Assessment 34 

Reading Skills 35 

Writing Skills 36 

Available Tests of Basic Literacy Skills— Reading 36 

Available Tests of Basic Literacy Skills— Writing 37 

Martha: A Case Study in the Selection of Available 

Tests of Basic Literacy Skills 39 

Chapter VI. Assessment of Everyday Literacy Activities 49 

Available Tests of Everyday Literacy Activities 50 

Nathan: A Case Study in the Selection of Available 

Tests of Everyday Literacy Activities 50 

Chapter VII. Assessment of On-the-Job Literacy 56 

Available Tests of On-the-job Literacy 59 

Phyllis: A Case Study in the Selection of Available 

Tests of On- the- Job Literacy Skills 59 

Suranary 62 

References 63 

Appendix A. Suggested Readings on Definitions of 

Functional Literacy 66 

Appendix B. Summary of Common Test Scores 68 

Appendix C. Published Tests of Basic Reading Skills 73 

Appendix D. Procedures for Scoring Writing Samples 80 

Appendix E. Published Tests of Writing Skills 83 



iii 



9 

ERLC 



Page 



Appendix P. Tests of Everyday Literacy Activities 93 

Appendix G. Clerical Tests Involving On-the-job Literacy 

Skills 94 

Appendix H. Publishers' Names and Addresses 95 



iv 



9 

ERIC 



List of Figures 

Page 



Figure I. Technical Review of Tests 15 

Figure 2. Instructions for Assigning Ratings in the 

Technical Review of Tests Form 18 

Figure 3. Setting Passing Scores Based on Test Inspection 28 

Figure 4. Setting Passing Scores Using the Combination 
Method 31 

Figure 5. Technical Review of Basic Reading Skills Tests 

(An Example} 42 

Figure 6. Technical Review of Writing Skills Tests 

(An Example) 45 

Figure 7. Technical Review of Everyday Literacy 

Activities Tests (An Example) ' 53 

Figure 8. Short Test of Clerical Ability Results 61 




Preface 



If you are a career counselor , community college staff member, 
employment program director, teacher of adult basic education or 
personnel staff member needing to assess the literacy ability of adults, 
then this Guide was written with you in mind. This Guide — the first in a 
projected series— • is meant to help you determine when to use existing 
tests to assess adult literacy. This is the most economical approach to 
testing. We recognize, however, that extant literacy tests often do not 
adequately reflect the multiple functions and social meanings of 
literacy. Therefore, future publications will show how to develop one's 
own literacy assessment methods to accommodate special needs. 

For purposes of this discussion, three general categories of literacy 
skills are distinguished: basic (generic) literacy skills, everyday 
skills, and job-related skills. The Guide offers specific examples of 
appropriate existing tests for measuring skills in each category. The 
status of literacy assessment in these three situations is also reviewed. 

The following examples illustrate the kinds of situations for which 
each chapter can offer guidance. 



CHAPTER 



EXAMPLE SITUATION 



I. 



Introduction 



II. 



Decisions Prior to Test 
Selection 



You are beginning to establish or 
are about to revise your general 
procedure and philosophy of 
assessment. 



III. 



Test Review Procedures 



You have determined that an 
existing test is likely to be 
useful in your situation but you 
are not sure how to select from 
among the available ones. 



IV. 



After Test Selection 



You have selected a test but are 
not sure how to determine what 
level of performance is acceptable 
and/or how to interpret results • 



V. 



Basic Literacy Assessment 



You are a counselor at a community 
college responsible for assessment 
of potential enrollees. You want 
to asj?s8 their literacy skills to 
assist them in selecting courses. 



vi 



ERIC 



/•J 



CHAPTER 



EXAMPLE SITUATION 



VI. Everyday Literacy 

Activities Assessment 



You are an instructor for a 
community college English as a 
Second Language Program. You need 
to assess the extent to which new 
students have the literacy skills 
necessary to perform everyday life 
tasks. 



VII. On-the-job Literacy 
Assessment 



You are a CETA intake officer or an 
industrial personnel officer who 
needs to determine if a person can 
perform literacy tasks required 
for a specific job. 



The impetus tor this Guide comes from research being conducted by the 
Functional Literacy Project at the Northwest Regional Educational 
Laboratory. In i.975, the Laboratory published Tests of Functional Adult 
Literacy; An Evaluation of Currently Available Instruments .* That 
review highlightjed the diverse definitions of functional literacy and the 
consequent confusion about how to assess it. Although it seemed likely 
that different definitions of literacy would be appropriate in various 
settings, there/ was little rationale available for selecting one 
particular definition or measurement method over another in a given 
situation. To/make that selection easier, the Functional Literacy 
Project began /to study how adults use literacy in their everyday lives. 
To get an insider's perspective, the project relies on ethnographic 
methods: actually living in a community, observing and participating in 
as many daily activities as possible, and interviewing not only community 
members, but public service agency and private business representatives. 
As staff gain insight regarding the social and contextual variations' in 
functional/ literacy, they intend to prepare additional, practical 
assessment? guidelines. 



*Nafaiger, D., et al. Tests of Functional Adult Literacy: An Evaluation 



of Currently Available Instruments . 
Educational Laboratory, 1975. 



Portland, OR: Northwest Regional 



vii 



1 ' 



Chapter I 
Introduction 



CHAPTER I 



INTRODUCTION 



Individuals, groups and institutions def 4 functional literacy in 
various ways depending i theit special needs and interests. Por 
example, public schools often view functional literacy in terms of 
students 9 demonstrating established competencies required for everyday 
life. These competencies may incorporate basic reading, writing and 
computation ski-Is. Examples include writing letters, locating 
government agencies, following voting procedures, preparing tax forms, 
completing job applications and giving oral directions. 

The definition of functional literacy used in the National Health 
Survey of 1973, on the other hand, refers to the average performance of 
fourth graders as measured on the Brief Test of Literacy. And a national 
survey of employers considered literacy as the integration of 
mathematical and linguistic skills necessary for filling out a job 
application, doing filing, conducting routine correspondence, monitoring 
inventories and writing articulately. The Adult Performance Level 
Project took yet another tack, establishing functional literacy levels 
according to the test performance of people at various levels of income, 
occupational prestige and educational attainment. And the National 
Census characterizes a literate individual as one who has completed six 
or more grades in school >s*nd has "the ability to read and write a simple 
message in any language."* 

This wide variety of definitions makes assessing functional literacy 
particularly difficult. As mi;, it be expected, there are as many 
assessment techniques and performance criteria as there are definitions, 
and the task of matching the appropriate technique to a particular 
✓ assist^ent situation is complex. For example, a standardized reading 
comprehension test designed for use in high school is often used to\ 
determine if a person has the necessary literacy skills to work in an 
industrial setting whe^e the only literacy task is reading an 
instructional manual. ' This is clearly a mismatch between situation and' 
approach. 

To further complicate matters, the variety of purposes and meanings 
ascribed to literacy ar» not carefully considered in many instructional 
and assessment situations. For example, it is often assumed that adults 
who have literacy skills will apply them as needed. Research suggests 
that this is hot at all the case; for many complex reasons, an aault will 
apply literacy skill? in some situations but not in others. Also, 
"literacy problems" arise from mismatches among individuals 1 values or 
expectations^ and those underlying literacy tests and assessments. For 
example, personal ideas about who should read in church or when one 
should write aynemo rather than make a phone call can affect performance 
on or reaction fep a literacy assessment that touches on these matters. 



*The reader interested in further discussion of alternative functional 
literacy def initions is referred to Appendix A for an overview of 
articles which highlight definitional problems, examine the issues 
involved and discuss the benefits anc limitations of various definitions. 

5024D 



2 



Value systems are often deeply established, the result of social concepts 
and practices that have evolved throughout the history of a community or 
a group of people '+ 

Moreover, many individuals use alternative means to cope with the 
lack of literacy skills: use of nonprint media, for instance, or 
assistance from more literate people. Thus, literacy assessment is not 
always a true reflection of competence. Persons who have found a means 
of circumventing their lack of literacy skills may perform certain tasks 
at ? higher level than expected. 

Presently, literacy assessment research is not sufficiently advanced 
for us to give specific guidelines on handling these issues. However, 
this Guide, based on our research findings to date, provides the reader 
with our best information on selecting and using existing literacy 
assessment materials. We wish to emphasize the. importance of clarifying 
the purpose for assessment first. Criteria for "best available 
instrument* include usability, validity and reliability for a given 
purpose . 



*Reder, S., & K. R. Green. Comparative aspects of the community 
structure of literacy: Annual report of the Functional Literacy 
Project. Portland, OR: Northwest Regional Educational Laboratory, 1981. 

Reder, S., & K. R. Green. literacy as a functional component of social 
fracture in an Alaska f< . "<j village. Paper presented at the 78th 
annual Meeting of the Am*. . Anthropological Association. Cincinnati, 
December 1979. 

Reder, S., fc K. R. Green. Social meanings of literacy in an Alaska 
fishing village. Paper presented at the First Annual Ethnography in 
Education Research Forum, University of Pennsylvania. Philadelphia, 
March 1980. 



ERLC 



3 

it 



Chapter II 

Decisions Prior to Test Selection 



CHAPTER II 



DECISIONS PRIOR TO TEST SELECTION 



This chapter addresses five important categories of decisions that 
must precede test selection. These decisions relate to 

• Category of literacy assessment 

• Purpose for testing 

• Uses and users of test results 

• Examinee characteristics 

• Logistics 



What General Category of Literacy Assessment is of Concern ? 

In identifying the general nature of literacy to be assessed, one is 
essentially setting forth an operational definition of literacy for the 
situation. The latter chapters of this Guide distinguish three 
categories of literacy skills: basic (generic) literacy skills, everyday 
literacy and on-the-job literacy. One must first decide which of these 
situations is appropriate. 

Tests designed to measure basic (generic) literacy skills cover 
skills and knowledge which are not specific to a given context, e.g., 
general vocabulary; phonetic analysis skills, including the ability to 
sound out new words; recognition of antonyms and synonyms; and the 
ability to write and comprehend simple sentences. Generally, these tests 
involve reading to gather information (reading to learn) rather than 
reading to perform a task (reading to do). 

Underlying the concept of basic skills measurement is the assumption 
that people can transfer skills from one situation to another. For 
certain people, such transfer does not seem to occur. For this and other 
reasons, the literacy assessment is best conducted within a restricted 
context or for a restricted purpose. 

This leads to our other two literacy assessment categories: everyday 
literacy and on-the-job literacy. Certain tests have been designed 
specifically to measure literacy within one of these two contexts. Such 
tests are far more useful and appropriate in these contexts than tests 
intended to measure a broad array of general literacy skills. By 
identifying the testing purpose and context in advance, one avoids being 
burdened with excess or irrelevant information: e.g., learning that a 
person can read at the fourth grade level when what really matters is 
whether the person can understand the instruction manual for operating a 
sewing machine. 

Specific skills within each literacy assessment category (generic, 
everyday, on-the-job) are further delineated in Chapters V through VII. 



ERLC 



5 

hi 



What Are the Purposes for Assessing ? 



Nearly all literacy assessment is done to facilitate some educational 
decision, each of which requires different kinds of information. For 
example, it would be inefficient to measure basic reading skills in 
detail if an administrator needed only a general idea of whether students 
could comprehend mitten materials. In other words, testing is only 
efficient when one knows in advance what kinds of decisions are to be 
made. 

Purposes for assessment can be divided into three broad categories! 
instructional management, entry-exit decisions and program planning and 
evaluation decisions. Let's take a closer look at each category. 

Instructional Management 

Instructional management includes (1) the diagnosis of individual 
learner strengths and weaknesses, (2) student placement and (3) 
educational and vocational student guidance. 

Diagnosis . The most frequent reason teachers test is to diagnose the 
strengths and weaknesses of individual students. Tests and other 
performance indicators can help pinpoint a student's current level of 
development and assist the teacher in selecting the next appropriate 
instructional unit. A test used for diagnosis must relate closely to 
course content and should be very detailed. Results must be readily 
comprehensible to teachers and students. 

Placement , whereas diagnosis helps determine which individual 
instructional units are most appropriate for a student, within a course, 
placement tests determine which course or class best meets a student's 
needs. Placement generally demands less detailed information than 
diagnosis. While such information is of greatest value to program 
adminstrators, instructors may also find placement test results useful in 
determining the general level of students' skills. 

Guidance . Guidance tests help predict the chances of success and 
satisfaction in various educational and vocational programs. Such tests 
need to measure skills which are predictive of success. Tests which giv? 
the examinee an indication his/her standing relative to other 
examinees is often useful. In guidance testing, it is the individual 
being assessed who makes decisions — with assistance from a guidance 
counselor. 



Entry-Exit Decisions 

Very carefully controlled testing can assist in entry or exit 
decisions in an educational or employment context. For example, literacy 
tests may be administered for the purposes of (1) selection for 
employment or (2) certification of competencies for occupational 
licensing or course completion. 



ERLC 



6 

1 1 



S election . Selection tests help an employer determine the best 
qualified person for a job or help an educator determine what applicants 
are most likely to succeed in or benefit from an educational program. 
Selection decisions have serious implications for a person's future snd 
are under close legal scruciny. Therefore, a test used for selection 
purposes must (1) clearly focus on these skills and knowledge essential 
for future success, and (2) rank order students in terms of relevant 
skills and knowledge to identify those most likely to succeed. 

Certification . Tests often play an important role in certifying 
those who have attained a minimum acceptable level of educational 
development. For example, a test may be used to certify attainment of 
minimally acceptable skills for receipt of a high school diploma. 
Certain professions also use a test tc certify competence to practice 
that profession. In either case, examinees must pass the test to be 
certified minimally competent. Therefore, the test must thoroughly and 
accurately cover all relevant competencies. 



Programmatic Decisions J, 

Tests also facilitate programmatic decisions. For instance, test 
data that reflect the needs of a group can serve as the basis for 
developing a new program, allocating resources or evaluating existing 
programs. Testing for these purposes falls into three categories: (1) 
survey assessment, (2) formative program evaluation and (3) summative 
program evaluation. 

Survey assessment . One established use of testing in education is 
surveying student achievement and exploring trends over time to make 
programmatic planning decisions. This kind of testing is usually 
designed to raise questions rather than answer them. For example, why 
are writing scores gradually declining among entering community college 
students? Why are the national trends in literacy among adults 
changing? A survey assessment should encompass the most predominant or 
significant elements of instruction for the population being surveyed; it 
cannot measure all the skills and knowledge taught in any one location or 
class. Survey information, though necessarily broad in nature, helps 
educational administrators set policy and allocate resources. 

Formative evaluation . The goal of formative evaluation is to 
determine which instructional units or features of a program (e.g., 
remedial reading) are effective, and which need revision. The decision 
to revise is made by program teachers and administrators. Tests covering 
specific interim and long-term program outcomes are administered durinj 
the program's operation to help shape the program during its formative 
stages. 



ERLC 



Summative evaluation , Suramative evaluation reveals a program's 
overall merit, thus suggesting to teachers and administrators whether 
that program should be modified, continued as is or terminated. Tests 
designed to assess students' performance on final learning outcomes of a 
program are typically part of such an evaluation, such te3ts are 
generally given prior to and following instruction, so that results can 
be compared and the impact of instruction determined. 



How and By whan Will Test Results Be Used? 

Before selecting a test, it is important to decide who will use the 
test scores. The kind of information needed by students may differ 
markedly from that needed by teachers, administrators or counselors. 
Theae groups differ not only in their interest, but also in their 
capability and experience in interpreting test results. For example, a 
student or teacher may be interested only in using an individual 
student's score to determine what should be taught next. An 
administrator, on the other hand, may want only group scores to justify 
financial support for further literacy instruction. 

Deciding who will use the results is easy if the testing purpose has 
been carefully delineated. Deciding what type of test score to report, 
however, may not be so easy. Test scores fall into two general 
categories: norm referenced and criterion referenced. Norm referenced 
test scores reflect how one examinee compares to another examinee or 
group. The most common types of norm referenced scores are percentiles, 
grade equivalent scores and stanines. Criterion referenced scores 
compare an examinee's performance to prespecified criteria, without 
regard to the performance of others. Criterion referenced scores are 
frequently expressed as "percent of objectives mastered" and "percent of 
items correct for each learning objective." 

Criterion referenced test scores are particularly useful for placing 
adults in a literacy program or an employment position. These scores 
illustrate the extent to which a person has acquired certain specific 
skills. The utility of norm referenced scores depends on the purpose for 
testing and on how well defined and appropriate the norming group is. 

Most published tests are normed on a national sample of people 
selected by grade level, age or occupation. Scores may be reported for 
subsets of the total norming group, e.g., rural vs. urban or Northeast 
vs. other regional groups. Norm referenced scores of this type may be 
useful in broad program assessment. For example, community college 
administrators may want to know how their students compare to community 
college students nationwide. 

In many situations, however, normative information is of limited use 
in testing adults. For example, suppose a surrey showed that 60 percent 
of the army personnel tested received scores below a grade equivalent of 
7.0.* What would this mean? Very little, because this score tells us 
nothing about what tasks the examinees can or cannot do, nor how their 
performance compares to that of other adults. 



♦The implicit norm groups used for grade equivalent scores are children 
in elementary and secondary schools. 



8 i 



Norm referenced scores are frequently thought to imply certain 
standards. In particular, it is often assumed that an above average 
score is satisfactory, and a below average one unsatisfactory. By 
definition, half the people tested must be above average and half below. 
Whether "average" performance is satisfactory, below satisfactory or 
superior is another issue entirely. This decision requires determining 
what specific tasks a person needs to be able to do, and what levels of 
performance within each task are acceptable. 

Grade equivalent scores present a particular problem. On many 
standardized tests the grade equivalent score scale is constructed in 
such a way that a score as high as 6.0 may be equivalent to the chance 
score.** On such a test, results showing that an adult is reading at 
the fifth grade level only indicate that the test was too difficult. 
There is no way to know whether the person has sane reading skills or was 
simply guessing in response to test questions. 

In short, national norm referenced scores may be of little use in 
adult literacy assessment. However, local norms may be worth 
consideration. Local norms, which can be developed on a criterion 
referenced or nationally norraed test, allow you to compare individuals' 
test performance to that of a group you select. For example, you may 
want to determine local norms for the incoming students in your 
particular community college.*** 

Various norm referenced and criterion referenced scores are described 
in more detail in Appendix B, along with the advantages and disadvantages 
of each. Additionally, many test manuals and measurement textbooks 
describe the differences among these scores. 



What Examinee Characteristics Influence Test Selection ? 

Too many tests attempt to measure technical literacy without regard 
for the functions and social meanings associated with literacy in the 
examinee's world. If an examinee is accustomed to reading only the 
newspaper and novels, a reading comprehension test using reading passages 
from scholarly articles may not allow the examinee to display his or her 
true ability to comprehend written material. Test materials unrelated to 
examinees 9 past experience, or lacking cultural or social significance, 
often significantly mask ability. 



**The chance score is the number of items one would expect a person to 
answer correctly on a multiple choice test merely by guessing. 

***Helpful information on developing local norms can be found in Chapter 
5 (Local Norms) of the Evaluator's References, Vol. II, ESEA Title I 
Evaluation and Reporting System. Mountain View, CA: RMC Research 
Corporation. August 1980. 



ERLC 



9 



Test format ia also crucial. For example, a recent Indochinese 
immigrant may never have seen a multiple-choice test. He or she may be 
able to read a paragraph orally and respond to oral questions, but not do 
well on a mul tiple-choice test simply because the format is unfamiliar. 
An alert examiner can sometimes 3ense when certain test characteristics 
may distort results. In some cases, an examinee may be taught the 
mechanics of test taking prior to testing, thus eliminating this 
problem. In other cases, it may be best to use another assessment 
approach • 



What Logistical Considerations Are Most Important? 

Certain practical logistical factors must be considered. Probably 
the most important of these are test length, type of administration and 
cost. 

First, the test must be of an appropriate length for a given 
situation. A series of tests designed for frequent classroom diagnostic 
* use, for example, must be fairly short, whereas a test used for a 
once-a-year survey assessment may be longer. 

Second, some tests must be individually administered. The additional 
costs of individual (vs. group) administration need to be carefully 
weighed against the increased value of the resulting scores. 

Third, when considering cost, one must first note whether test 
manuals, booklets and answer sheets may be purchased separately. Some 
items may be reusable (e.g., test booklets) whereas others are not (e.g., 
answer sheets). Further, having the test scored by the publisher may be 
costly, depending on the type of reports desired. Usually a variety of 
reporting options are available. 



Summary 

This chapter has presented key decisions a person must make before 
identifying a specific test or assesanent approach to use to assess 
literacy. Those decisions relate to (a) the categories of literacy, 
(b) the purpose for testing, (c) the uses and users of test results, 
(d) examinee characteristics, and (e) logistics. 



10 



Chapter III 

Test Review Procedures 



1 




CHAPTER III 



TEST REVIEW PROCEDURES 



Once the issues raised in Chapter II are resolved, one is ready to 
select or develop an assessment approach which will provide the desired 
information. Basically there are two choices: using an existing test or 
developing a new one. The most economical approach is, of course, to use 
an existing test — provided a high quality, appropriate instrument can be 
obtained. Existing tests are available from publishers or from people 
who have developed their own for a specific purpose but have never 
published them. While published tests are generally extensively 
reviewed, pilot tested and revised before publishing, they may not have 
all the characteristics important for a particular situation. One must 
carefully review either published or unpublished tests, therefore, to be 
sure they can provide useful results. 

This chapter provides general guidelines for reviewing existing 
tests. Chapters V-VII focus on the application of these guidelines in 
specific situations. 



Preliminary Screening 

The first step in test selection is to locate existing tests and 
identify thoee that came dose to measuring the specific skills one 
desires to measure. The appendices referenced in Chapters V-VII of this 
handbook list tests which will serve as a starting point for this 
search. Several reference books are also useful: 

T he Mental Measurement Yearbook (Buros, 1978) reviews hundreds of 
existing tests. It is updated about every six years. Tests in Print 
(Buros, 1974) provides descriptive information on all published tests. 
It is updated about every ten years. A recent publication entitled A 
Consumer's Guide to Writing Assesanenc (Bridgef<)rd 6 Stiggins, 1981) 
provides very helpful descriptions of existing writing tests, and Tests 
of Functional Adult Literacy: An Evaluation of Currently Available 
Instruments (Nafziger, et al., 1975) provides comprehensive technical 
reviews of many existing instruments. 

Periodically, documents in the ERIC* system contain test reviews. 
Another helpful source is the monthly publication entitled News on Tests , 
available from the Educational Testing Service (ETS) .** ETS also houses 
a test collection. They will provide, upon request, summaries of 
existing tests categorized by content area. Helpful suggestions can also 
be obtained from community college program personnel (e.g., Adult Basic 
Education, high school equivalency, career counseling, refugee or 
tutoring programs), university libraries, CETA offices and state 
departments of education or directly from publishers. 



♦ERIC is Educational Resources Information Center, an educational 
information data base providing access to fugitive documents. 

"For further information on this publication and Test Collection 
Service, contact Test Collection, Educational Testing Service, Princeton, 
NJ 08541. 

O 

ERIC 



23 



It is not easy to locate unpublished tests since no central source 
exists. They are best located by word of mouth and checking sources such 
as ERIC* Often the unpublished tests are associated with a research 
study reported in the ERIC system. 

Before reviewing a test itself # one should fir£t review the 
descriptive information provided through the publisher or one of the 
reference books mentioned in the preceding paragraphs. Once a small 
number of relevant tests have been identified, the more time consuming 
review for technical quality can begin. 

When reviewing test references, keep in mind that tests are usually 
revised every five to ten years. In almost all cases, the most recent 
edition should be considered. Also, many tests are part of a series 
spanning a wide age range (e.g., K-adult) . Only the level appropriate 
for the group with which it is to be used need be reviewed. In some 
cases, though, you may want to review several levels to determine which 
is best for your situation. For example, in an adult basic education 
class, a diagnostic reading test designed for junior nigh students may 
cover precisely the correct level of skills. 

An important consideration is whether the test is norm and/or 
criterion referenced. One may be more useful. If a norm referenced 
score is desired, it is particularly important to determine if the 
norming group is relevant for the situation. 

Information on the intended uses of the test, skills and knowledge 
measured, response format (e.g., multiple choice), administration time 
and available scoring procedures (hand scored vs. machine scored) will 
also help you dei ,rmine quickly whether the test merits further 
consideration. 

This preliminary screening should yield a handful of potentially 
useful tests. The next step is a careful technical review. 



Technical Quality Review 

A technical quality review of existing tests demands a set of 
criteria. Three general categories of criteria should be used when 
reviewing functional literacy tests: validity, usability and 
reliability. Validity refers to '.he extent to which a tejt measures the 
skills and knowledge the user intends it to measure. Usability refers to 
the extent to which a test is suitable for the context in which it will 
be used. Reliability refers to the extent to which a test measures a 
trait consistently. 

When reviewing adult functional literacy tests, one should ask 
specific questions relating to each of these three categories. Suggested 
questions are given on the rating form in Figure 1. Following the form 
are instructions for assigning points (Figure 2). Extensive rationales 
for these questions are not provided. In cases where the reasons are not 
intuitively obvious, the reader is referred to Guidelines for Selecting 
Basic Skills and Life Skills Teats (Anderson, Stiggins 6 Hiscox, 1980) or 
college level measurement textbooks. 




13 



21 



Once tests have been rated on each of the technical quality 
questions, an overall judgment must be made. In almost all cases, the 
two critical questions will be: 

♦ Do the test items measure the specific learning objectives or 
skills that need to be assessed? (Question 1 under Validity) 

• Is the appropriate type of score reported for the intended use? 
(Question 6 under Usability) 

If a te$t does not meet these criteria, the other aspects of its quality 
will be of little consequence. Scores of tests which meet these criteria 
should be summed, and the test with the highest score selected (assuming 
all othefc criteria are considered of equal importance). 



14 



FIGURE 1 
TECHNICAL REVIEW OP TESTS 



Rate each teat of interest using the eating system described in the 
instructions in Figure 2. 



Validity 

1. Do the teat items measure the specific learning objectives or skills 
that need to be assessed? 

List learning objectives or skills to be assessed in the left hand 
column of the chart below. Put the name of each test being reviewed 
in a numbered box at the top of one of the right hand columns. 
(Repeat on each page of the test review chart.) ^jjader each test 
indicate the number of items measuring each objective. Review the 
test manual or the actual test to determine if each objective or 
skill is measured. The publisher may indicate in the manual what 
skill each test item measures. If not, examine each test item to 
determine what skill or objective it measures. 



Test Name 



Learning objectives oc 
skills to be assessed 


/ 1 




1' 




1. 










2. 










3. 










4. 










5. 










6. 











For the remainder of the questions, assign points as described in the 
Instructions for Assigning Ratings in the Technical Review of Tests Form 
(Figure 2) • The number in parentheses after each question indicates the 
number of possible points. 



15 

2 1 



FIGURE 1 (continued) 

Teat Name 



2. Were the test items developed 
in a systematic and rigorous 
manner so that the content is 
adequate and bias is 
minimized? (6) 



3. Were any empirical procedures 
used for screening or selecting 
items to ensure that items are 
measuring what they were 
designed to measure , are 
understandable, contain 
reasonable answers, and are 
free of ambiguous alternative 
answers and unnecessarily 
complex language? (3) 



is the validating group 
representative of the 
population with which the test 
is to be used? (2) 



5. Are any special validity 
studies reported or 
specifically referenced? (2) 




Usability 



1. Are the test items suitable for 

adults with limited literacy 
y skills? (2) 



2. Are instructions to the test 
administrator clear and 
complete? (2) 



ERIC 



16 

27 



FIGURE 1 (continued) 

Test 



3. Are instructions to the 

examinee written in clear and 
understandable terms? (2) 



4. Is the test formatted clearly? 
(2) 



5. Is thero a simple way for 
examinees to record their 
responses? (2) 



6. Is the type of score reported 
useful for my situation? (2) 



7. Is the process of converting 
raw scores simple and does it 
yield scores which are easily 
interpreted? (2) 



8. Is the amount of time required 
for testing appropriate? (2) 



Reliability 

/ 

/ 

1. Is reported reliability for 
major subtests and/or total 
test scores sufficiently high? 
(3) 

2. Are the scoring procedures 
clear and complete $ thus 
ensuring reliable scores? (2) 



ERLC 



17 

2o 



FIGURE 2 

INSTRUCTIONS FOR ASSIGNING RATINGS 
ON THE TECHNICAL REVIEW OF TESTS FORM 



I. 



1. 



2. 



Validity — the extent to which the test measures what the user 
intends it to measure. 



Do the test items measure the specific learning objectives or 
skills that need to be assessed? 

To answer this question, list the learning objectives to be 
assessed. Then go through the test item by item and determine 
which objective, if any, each item measures. Adequate coverage 
requires at least three items measuring a given objective. If 
the test measures 75 percent of your objectives, the coverage 
may be adequate for placement decisions. It would not 
necessarily be adequate for diagnostic decisions. 

Were the test items developed in a systematic and rigorous 
manner so that the content is adequate and bias is minimized ? 
(6 possible points) 

Award points for this criterion on the following basis: 

17 "Ketatton o f i te ms to specific objectives! 

_* oints if test items generally relate to specific: 
objectives or criteria (tasks from u uask analysis) 

1 point if items relate to general content areas 

0 points if items do not generally relate or if objectives 
or criteria are lacking 

2) Item development procedure: 

2 points if procedures for developing test specification and 
items are described in detail 

1 point if reference is made to use of a specific rigorous 
item development procedure but details are not given 

0 points if no information is provided on item selection 



18 



eric 




FIGURE 2 (continued) 



3) Bias review procedures: 

2 points if statistical analyses and reviews for racial, 
sexual and cultural perspectives were conducted 

1 point if either statistical anaylses or technical reviews 
were conducted 

0 points if neither bias review procedure was used 



3. Were any empirical procedures used for screening or selecting 
items to ensure that items are measuring what they were designed 
to measure, are understandable, contain reasonable answers, and 
are free of ambiguous alternative answers and unnecessarily 
complex language? (3 points possible) 

Btpirieal procedures include item analysis, review by juries of 
experts, review of ituu difficulties, criterion-group analyses 
or factor analyses. Award poi its on the following baais; 

2 points if more than one appropriate method was conducted 
and reported in detail 

2 points if more than one method was used but insufficient 
reporting is done to assess its appropriateness, or if one 
appropriate method is reported in some detail 

1 point if it is stated that one method was used 

0 points if no information is given 

4. Is the validating group representative of the population with 
which the test is to be used? * (2 points possible) 

Include the following considerations in the evaluation of 
validating group representativeness: 

(1) Were both males and females included in the validating 
group? 

(2) Were all major ethnic groups represented in the validating 
group? 



♦The rating system used on this criterion assumes that the test user is 
working with a broad cross section of adult learners and wants assurance 
that it is appropriate for such a range of people. There may be cases 
where a more parochial test is desired. This criterion would then be 
modified to fit that population. 



9 

ERLC 



19 



FIGURE 2 (continued) 



(3) Was the validating group a nationally representative sample 
in terms of population density characteristics (e.g., urban, 
suburban, rural, etc.?), and geographic c .dentation of 
the age group for which the test was designed? 

(4) Was the sample obtained through cluster, stratified or 
randan rather than incidental sampling? 

(5) Was the validating done les r than five years ago? 

Award points based on the following: 

2 points if answers to at least four of these questions 
were "yes" 

1^ point if answers to two or three questions were "yes" 

0 points if fewer than two answers were "yes* or insuffi- 
cient information was provided to determine the answers. 

Are any special validity studies reported or specifically 
referenced ? (2 points possible) 

Include these two questions here in the consideration of this 
criterion: 

(1) Has anyone examined the relationship between this test and 
other measures of adult literacy? 

(2) Are any studies referenced or reported which examine how 
this test predicts success in other educational programs, 
life survival tasks or jobs? 

Award points as follows: 

2 points if at least one study is referenced for both 
questions 1 and 2 above 

1. point if only one study is referenced for either question 

1 or 2 above 

0 points if no studies are referenced for either question. 



20 




FIGURE 2 (continued) 



II. Usability — the extent to which a test is suitable for the context in 
which it will be used. 

1. Are the teat items suitable for adults with limited literacy 
skills? (2 points possible) 

Award points on the following basis: 

2 points if all items appear inoffensive, reasonably 
appropriate in difficulty and intellectually stimulating 
(regardless of test content) 

1 point if most items appear appropriate 

0 points if many items are judged inappropriate because 
(1) they are inappropriate or offensive to special groups 
or (2) they are too dull or insultingly simplistic. 

2. Are instructions to the test administrator clear and complete? 
(2 points possible) 

Award points on the following basis: 

2 points if the instructions to the administrator clearly 
and completely describe (1) the materials to have 
available, (2) the conditions in the room where testing was 
to occur, and (3) time allotments for testing, if any 

1 point if any of the above is unclear 

0 points if more than one of the above is unclear. 



3, Are instructions to the examinee written in clear and 
understandable terms? (2 points possible) 

Award points on the following basis: 

2 points if (1) instructions clearly and precisely 
describe the examinee* s task and (2) sample items are 
included which effectively illustrate the task 

1 point if instructions clearly and precisely describe the 
examinee 9 s task but no sample items are provided 

0 points if instructions are unclear, incomplete or 
nonexistent. 



ERLC 



21 

32 



FIGURE 2 (continued) 



4. Is the test formatted clearly? (2 points possible) 

Test layput should be examined for effective use of perceptual 
organizers, such as adequate white space, regularity of item 
form, syntoetry, clarity and continuity. 

Award points on the following basis: 

2 points if (1) test page layout is clear and helpful and 
(2) print and illustrations in printed tests and sound in 
auditory or taped tests are high quality 

1 point if only (1) or (2) above apply 

0 points if layout is unclear or confusing or quality of 
print or tapes is low. 



5. Is there a simple way for examinees to record their responses ? 
(2 points possible) 

Award points on the following basis: 

2 points if response is especially simple for 
examinee — e.g., oral responses, or marking or writing 
directly on test form 

1 point if test uses standard separate answer sheets 

0 points if test is complicated by the need for more 
than one step to get from item to an, vers. 



6. Is the type of score reported useful for my situation? 

Award points on the following basis: 

2 points if the score reported is precisely the type needed 

1^ point if a usable score can be obtained even though it is 
not exactly what is desired 

0 points if the desirable score is unattainable. 



ER?C 



22 



3„ 



FIGURE 2 (continued) 



7. Is the process of converting raw scores simple and dees it yield 
scores which are easily interpreted ? (2 points possible) 

Award points on the following basis: 

1 point if the scores are reported in reference to a 
clearly identified norm group/ a level of competency on 
clearly identified skills or learning objectives or in 
terms of meaningful raw scores — e.g. , a words-per-minute 
reading rate or a precise report of letters for which the 
examinee could not give the sound 

£ points if converted scores are ambiguous or conversion is 
lacking for raw scores not meaningful in themselves. 

And: 

^1 point if the score conversion procedure is simple, 
involving one easy-to-understand step — such as a clear 
chart or table— or no conversion is necessary because the 
raw scores are inter pre table. 

0 points if the process of achieving final scores is 
complicated by lack of clear or simple tables or graphs or 
it it requires two or more steps to get from the raw to the 
converted scores — e.g., using one table to get into another 
table. 



8. Is the amount of cime required for testing appropriate ? 
(2 points possible) 

Award points on the following basis: 

2 points if the amount of time matches well with the 
available time 

1 point if adjusonents can be made by either using only 
part of the test or adjusting the testing time limits 

0 points if it will be extremely difficult to accommodate 
the required testing time. 



ERLC 



23 



34 



FIGURE 2 (continued) 



III, Reliability— the extent to which the test measures a trait with 
consistency. 



1. Is reported reliability for major subtests and/or total test 
scores sufficiently high? (3 points possible) 

One or more of three different types of reliability may be 
reported— alternate form reliability, test-retest reliability, 
and internal consistency estimates. 

Assign points on the following basis? 

Reliability Points Awarded 

•90 or above 3 

.75 to .89 2 

.65 to .74 1 

•64 or below 0 

When more than one type of reliability is reported, use the most 
frequent rating across the different reliabilities. 



2. Are the scoring procedures clear and complete, thus ensuring 
reliable scores? (2 points possible) 

Award points on the following basis: 

2 points if (1) scoring procedures are clear and complete 
and (2) scoring of obi ective items is done using a scoring 
guide, template stencil or other straightforward process, 
or machine scoring is available and scoring of subjective 
items is done using rigorous training and scoring guides. 

1 point if only (1) or (2) above applies 

0 points if neither (1) or (2) above applies. 



ERLC 



24 

26 



Chapter IV 

After Test Selection 



ERIC 



CHAPTER IV 
AFTER TEST SELECTION 



Once a test is selected, one must determine acceptable levels of 
performance, and decide how to report results to ensure they are used for 
their intended purpose (s). 



Setting Performance Levels 

Keep in mind that setting performance levels is secondary to 
determining test content. If the content tested is inappropriate (e.g., 
using a standardized reading comprehension test for making decisions 
about job placement), then performance levels are of little importance. 

Setting a level of accept*. performance on a test is separate from 
obtaining a test score. Whether scores are norm or criterion referenced, 
the level of satisfactory performance for a specific situation must be 
determined. 

Setting standards of acceptable performance is always an arbitrary 
decision, but it should at least be an informed arbitrary decision. 
There is no magical answer concerning what level of performance is 
satisfactory; the decision must rest on the opinions of experienced, 
knowledgeable people who understand the nature of the skills or knowledge 
being tested and the general capabilities of the examinees in question. 

There are two methods of setting performance levels: reviewing 
actual student performance, and reviewing test items.* Either approach 
calls for human judgment. 

One method, which relies on inspection of the test, is described in 
Figure 3. In this method, judges are provided with a set of all the 
items on a test. Looking at each item, they determine how many wrong 
alternatives a minimally qualified examinee should be able to identify. 
The number of remaining alternatives is used to determine item 
difficulty. The score indicating acceptable performance is based on the 
average difficulty of all test items. 

Other standard setting methods call for judges to review the scores 
of selected groups of students. Knowing how ceruain students performed, 
the judges then decide what level of performance is acceptable for a 
given purpose or situation* 



*See Handbook for proficiency assessment: Section Vl-passing scores . 
Berkeley, CA: Educational Testing Service, December 1979 for further 
information on the methods described on the next pages as well as 
additional methods. 



ERLC 



26 

3; 



The simplest method of this type is the Borderline Method. A group 
of students is identified as being so close to the borderline between 
mastery and nonmastery that a teacher or counselor cannot be certain if 
they need supplemental instruction or not. Approximately 100 such 
borderline students are then tested, and the median score attained by 
these students is selected as the passing score. 

It is possible to combine these two standard setting methods. Figure 
4 describes one way this might be done. 

The previous example focused on minimal competency levels. It is 
possible to have more than one competency level: e.g., one level to 
distinguish superior from average performance, another to distinguish 
average from incompetent performance. The methods discussed earlier 
could be easily adapted for setting multiple levels. 

In setting performance levels, the crucial question is: Whose 
opinions should guide what's termed "acceptable"? For example, suppose 
test results will determine whether a person is ready to undertake a job 
that requires reading a manual. Then the opinions of people who 
supervise that position may be particularly relevant. In some cases, 
groups or individuals may set quite different standards for acceptable 
performance. For example, suppose that incoming freshmen in all 
community colleges in a state are tested. Though all students take the 
same test, staff at each college may set their own standards for 
acceptability. 



27 



FIGURE 3 



SETTING PASSING SCORES BASED ON TEST INSPECTION* 



DESCRIPTION OF 
THE PROCEDURE* 



Judges are provided with a set of all items in the 
test. Looking at each item, they decide for each 
question the number of wrong alternatives a minimally 
qualified person would be expected to identify as 
obviously wrong. The number of alternatives remaining 
constitutes the set from which the examinee is 
expected to guess. The passing score of the test is 
the average judged difficulty of all items. 



NUMBER OF 
JUDGES 



Between seven and 25 judges representing perspectives 
deemed important should be used. 



ADVANTAGES OF 
THE METHOD: 



The procedure is independent of the number of 
students talcing the test. A passing score can be 
calculate^ for even very small groups of students « 
The procedure can accommodate participation by a 
broad cross-section of judges (e.g., teacher?, 
administrators, parents, students, employers). 
The procedure is based on close scrutiny of test 
items. 

The procedure closely parallels the decision 
processes of test takers; each alternative for 
each item is individually considered. 



DISADVANTAGES 
THE METHOD: 



OF 1. The procedure is blind to actual performance on 
the test. 

2. The passing score can be too high or too low when 
there are a disproportionate number of judges 
with the same interest or bias. 

3. The procedure must be repeated for each form or 
test when different forms or tests are uaed that 
are not equated. 

4. More time and people are needed to make judgments 
than with performance-based procedures. 

5. The procedure can only be used with multiple 
choice tests. 



*Thia method is known as the Nedelsky method. The information in this 
figure is taken from the Handbook for proficiency assessment ; 
Section IV-Passing Scores * Berkeley, CA: Educational Testing Service, 
December, 1979. 



28 



ERLC 



3j 



FIGURE 3 (continued) 



PROCEDURES FOR TRAINING JUDGES 



1. Orally review the purpose of the tests and standard- set ting ersrcise, 
including presentation and discussion of the purpose of the 
standard. Be sure to distinguish between. average performance and 
minimally acceptable performance. 

2. Distribute practice questions. 

» 

3. Explain to judges that this method requires a group of knowledgeable 
judges to inspect each question and make a judgment about each wrong 
alternative. Each judge must decide whether a hypothetical examinee 
who just barely meets the definition of a minimally acceptable 
performance for the situation under consideration could be expected 
to eliminate the wrong alternative. 4 

4. Review the practice questions and as a group decide which 
alternatives could be eliminated by a minimally competent person. 
Complete a judge's recording form for the practice questions. 

5. After the task is clear to the judges, pass out the test booklets and 
as a group decide which alternatives could be eliminated. For each 
wrong alternative where there is not unanimous agreement, ask one 
judge from each viewpoint to give a brief explanation of his/her 
reason*. The purpose of such an exchange is not to force consensus 
but to allow different points of view to be heard. Use a judge's 
recording form to allow each judge to indicate his/her decision of 
the number of alternatives which can be omitted. 

6. Go through each question on the test in the way described above. 



7. When all test items have been reviewed, have judges tally each column 
of circled numbers. Transfer the column totals to the corresponding 
blanks by the probabilities and multiply. The sum of those 
multiplications is that judge's passing score estimate. Average all 
of the judges' estimates to obtain the group's recommended passing 
score. 



ERLC 



29 

4u 



FIGURE 3 (continued) 



Example Test Items : 



Directions ; Look carefully at the 
following lists of words.' Each row 
has one word that is spelled wrong. 
Mark your arts we r sheet to show the 
word in each row that is spelled 



wrong. 




1. a. 


house 


b. 


woman 


c. 


suny 


d. 


have 


2. a. 


wash 


b. 


second 


c. 


zipper 


d. 


rownd 


3. a. 


lesson 


b. 


appel 


c. 


toys 


d. 


because 



Judge's Recording Form: 


Test Item 


Circle Number of 


Number 


Choices Eliminated 


1 


0 


12 3 


2 


0 


12 3 


3 


0 


12 3 


Total O's « 




m x .25 - 


Total I's * 




x .33 - 


Total 2's « 




x .50 ■ 


Total 3's « 




m x 1.00 » 






Sum » 



9 

ERIC 



30 



41 



FIGURE 4 



SETTING PASSING SCORES USING THE COMBINATION METHOD* 



DESCRIPTION OF 
THE PROCEDURE: 



Judges are provided with a description of each skill 
covered in the test, a sample item m isuring that skill 
taken from the test, performance data on the skill gathered 
from examinees and the projected number of examinees who 
would fail given several different passing scores. Each 
judge considers this information and decides what 
percentage of the items in that skill area a "proficient" 
person should answer correctly. 



JUDGES: 



ADVANTAGES OF 
THE METHOD: 



Either a small group of judges representing various 
perspectives can be used to set passing scores or a larger 
sample of the constituencies can participate in the 
decision making by responding to a mailed survey. 

1. The combination of representative items and local 
performance levels provides judges with the most 
comprehensive set of information possible. 

2. The procedure is independent of the number of people 
taking the test. The passing score can be calculated 
for even very small groups. 

3. The procedure can accommodate participation by a broad 
cross-section of people (e.g., teachers, 
administrators, parents, students, employers). 

4. Judges' decisionmaking time is reduced because only 
one judgment per subtest rather than one per item is 
required. 

5. The procedure can be adapted easily for mailing as a 
survey to receive many views. 



DISADVANTAGES OF 
TmI METHOD: 



1. Item specifications must narrowly define the 

range of content and difficulty of items used in the 
test; otherwise the sample item may be misleading to 
the judges. 

2. Performance data raust^be recent and must adequately 
represent all examinees for whan the performance level 
is being set. 

3. The passing score can be too high or too low when 
there are a number of judges with the same interest or 
bias. Returns from mailed surveys are particularly 
vulnerable to special interests. 

4. More time and people are needed to make judgments than 
with performance-based methods. This method is 
particularly time consuming if a mailed survey is used. 



♦Taken from Handbook for Proficiency Assessment: Section IV-Passing Scores , 
Berkeley, CA: Educational Testing Service, December 1979. 

„ '31 

^ 42 



FIGURE 4 (continued) 



EXAMPLE OF INFORMATION JUDGES MIGHT USE IN SETTING PASSING SCORES 
USING THE COMBINATION METHOD 



Subtest Content: SPELLING 



Sample Item : (One of six items) 
Mark your answer sheet to show the 
word that is spelled wrong. 



Expected Peformance on Subtest 
(Adult Basic Education Class): 
85 percent correctly answer item* 



a. house 

b. woman 

c. suny 

d. have 



Projected Failure/Success Rate of Adult Basic Education Students : 

If passing score The % of students who The % of students who 

is set at: would probably pass: would probably fail: 

100% 38% 62% 

90% 71% 29% 

80% 87% 13% 

70% 92% 8% 

60% 95% 5% 



SAMPLE INSTRUCTIONS FOR SETTING PASSING SCORES 
ON LANGUAGE ARTS TEST (COMBINATION METHOD) 

Directions: After considering the information provided for you, please 
'- indicate in the corresponding space on the right what you 

consider to be an appropriate minimum passing score for 
Adult Basic Education students for each of the areas listed 
below. Use percents to show the passing score you think is 
appropriate. 



Skill Areas Percent 

A. Spelling (6 items) 

B. Subject/Verb Agreement (4 items) 

C. Ending PunctuatioD (4 items) 

D. Complete Sentences (4 items) 

E. Capitalization (6 items) 

Now go back and average the passing scores assigned to each skill area so 

that you have a passing score on the total Language Arts test. 



Minimum Passing Score for Total Test 



32 



Interpreting Test Results 



Prior to reporting, test scores must be carefully interpreted. 
Scores in themselves are virtually meaningless unless the context and 
purpose for testing are explained. In addition, the following factors 
should be considered in interpreting the results: 

a. Curriculum characteristics (e.g., discrepancies between what is 
tested and what is taught) 

b. Staff characteristics (e.g., the teaching or administrative 
structure) 

c. Student characteristics (e.g., students 1 ability, interests) 

d. Technical characteristics of the test (e.g., the test's 
reliability and validity) 

e. Sampling characteristics (e.g., the relationship of those testad 
to a larger group, to which the user may generalize) 

f. Social factors (e.g., the social meanings of the skills tested 
for the examinees) 

A wide rai\ge^of perspectives enhances interpretation. Teachers, 
curriculum sptS^pl^tfts, administrators, employers, guidance counselors, 
measurement sp^jtjllists and students may each be able to contribute 
uniquely to interpretation of the test rebults. 



Reporting Test Results. 

Total reliance on written reports assumes that those awaiting test 
results >ave the interest and time to read and interpret what is 
written. This is often not the ^ase. Alternative approaches should be 
considered. 

The best reporting method ckp^nds on what needs to be ooir^inicated. 
A major distinction can be madi between (1) one-way provision of 
information (e.g., written rerjort*) and (2) two-way interaction (e.g., 
community meetings) « Written or televised reports may sometimes be 
> useful; in other situations, personal discussions about the results can 
be much *cv* appropriate. 



« 



33 

4 i 



Chapter V 

Basic Literacy Assessment 



CHAPTER V 



BASIC LITERACY ASSESSMENT* 



The term basic literacy skills is used here to mean those skills a 
person needs in order to learn and communicate via reading or writing. 
Though curriculum specialists hav» probably been most active in 
delineating these skills, persons from the fields of psychology, history, 
linguistics and library science have all contributed to their 
identification (Bormuth, 1975, p. 70). 

Although exhaustive listings and definitions are not appropriate 
here, reading and writing taxonomies are presented to give the reader a 
general impression of the range of skills to consider when conducting 
basic literacy assessment. 



Reading Skills* * 

1. Decoding skills: Decoding skills enable a person to recognize 
letters, letter groups and patterns in print and their associated 
oral sounds and meanings. Phonetic analysis (associating sounds with 
letters) and structural analysis (associating syllables, affixes or 
whole words with their corresponding sounds) are usually considered 
decoding skills. 

2. Literal comprehension: Literal comprehension skills enable a person 
to understand information explicitly stated in a text. Included are 
vocabulary skills (assigning the correct meaning to words in context) 
and ability to combine the meanings of words in sentences, or 
sentences in paragraphs.' 

3. Inference skills: Inference skills allow a person to derive 
information not explicitly stated, i.e., to "read between the lines." 

4. Critical reading 3kills: A person with critical reading skills can 
thoughtfully examine a text for logical consistency, as well as 
detect and evaluate propaganda techniques. 

5. Aesthetic appreciation skills: These skills enable a person to 
evaluate the tone or mood of a story or the rhythm of prose. 

6. Reading flexibility skills: These skills allow a person to read 
faster or slower $ depending on the nature of the task, to focus 
selectively on parts of the text, and to switch attention to conform 
to a wide variety of instructions. 

7. Study skills: These skills enable a person to use various reference 
materials (e.g., maps, graphs, chart3, tables of contents and 
diagrams) to locate information and judge its relevance tc a 
particular task. 



♦Chapters VI and VII focus on everyday literary skills and on-the-job 
literacy skills, respectively. 

**This reading taxonomy is from Bormuth (1975). Other taxonomies include 
that provided by Stiggins (1981), in which seven component skills of 
reading are presented. Stiggins 1 taxonomy assumes that reading is an 
interactive process in which the characteristics of the written text 
interact with the reader's knowledge. 



Writing Skills * 



1. Ideas — the quality, development, support and relevance of the 
arguments, opinions and thoughts expressed. 

2. Mechanics — factors such as usage, sentence structure, punctuation and 
spelling. 

3. Organization — a sense of order, ability to stay on topic and relate 
details to a central idea or argument. 

4. Wording and phrasing— the choice and arrangement of words, including 
the deletion of unnecessary words. 

5. Flavor—the personal qualities revealed by the writing— style, 
individuality, originality, interest and sincerity. 

One could assess the kinds of reading or writing skills noted here 
using available tests for several of the testing purposes mentioned in 
Chapter II. The most likely purposes would be diagnosing specific needs 
for further instruction, placing a student in the appropriate level of an 
English or writing course, guiding a student into a general program of 
study, and conducting a survey assessment of literacy skills among a 
broad group of people. 

In several situations, existing basic literacy skills tests are not 
likely to be useful. Most selection or certification decisions affecting 
adults are an example. Such decisions would be less likely to rest on 
basic literacy skills than on job-related or everyday life skills. 
Similarly, most formative or summative evaluation purposes would be best 
served by a test designed specifically to measure skills taught in the 
program. 



Available Tests of Basic Literacy Skills — Reading 

Reading tests are among the most common published tests. Most have 
been developed for elementary school students but a significant number 
exist for junior and senior high school students and adults. In recent 
years, an increasing number of criterion referenced tests of basic skills 
have become available. Although many of these tests are designed for 
elementary and secondary school students, they can also be used with 
adults who are acquiring basic skills. It is important to ensure at the 
onset that the test measures the desired skills, \nd that the test items 
do not appear childish to an adult. Most tests designed for junior high 
and higher are adaptable for use with. adults. 



♦This writing taxonomy is based on the work of Diedrich. In an effort to 
identify the characteristics that most influence a writing expert's 
judgment of the quality of a piece of writing Diederich (1974) analyzed 
the results of ratings of a large number of student essays. He isolated 
the factors givsn in this taxonomy. This list is actually qualities of 
writing. The ability to achieve each of these factors is the skill to be 
assessed. 



36 

ERLC 



Available tests often provide separate subtest scores. The most 
common subtests on reading tests are decoding , vocabulary , comprehension 
•and study skills. Published tests measuring oral reading and reading 
flexibility skills are less common, but some do exist. 

Appendix C provides a fairly comprehensive list of available reading 
tests published in the last ten years. The reader should view this list 
as a starting point. For further detail on these and other tests, review 
the reference materials mentioned in Chapter II. 



Available Tests of Basic Literacy Skills— Writing 

Two general methods of assessing writing proficiency are used: 
direct and "indirect. The direct approach involves gathering samples of 
writing and evaluating them according to specified criteria. This method 
simulates real-life writing circumstances and requires trained judges to 
apply the criteria. The indirect approach involves the use of objective 
tests (usually paper- ana-pencil multiple choice tests) to measure 
language usage skills important in effective writing. This approach is 
often less costly because tests can generally be machine scored. 

Although the results of indirect and direct assessments are generally 
highly related, indirect assessment should not be considered a substitute 
for direct. Bach measures different skills. Direct measures focus on 
writing composition skills while indirect measures focus on prerequisites 
of good writing. The following lists of skills measured by the two 
methods were distilled from 18 statewide writing assessments being 
conducted in the United States:* 



♦This analysis is takan from Bridgeford, N. , & R. stiggins. A Consumer's 
Guide to Writing Assessment . Portland, OR: Northwest Regional 
Educational Laboratory, 1981. 



DIRECT ASSESSMENT 



INDIRECT ASSESSMENT 



Usage 

Sentence sense 
Expression of feeling 
Persuasiveness 
Organization 
Format 

Cohesiveness 
Revision skills 
Transition 

Overall writing proficiency 



Punctuation 
Grammar 



Placing modifiers 
Determining idea relevance 
Diction 
Style 

Transition 
Logic 

Organization 

Overall usage proficiency 
Sentence structure 



ERLC 



37 



Each method has advantages and disadvantages. The major advantages 
of direct assessment are the extent of information gathered (i.e., when 
one has a chance to examine an actual sample of the examinee's writing); 
the flexibility or adaptability of the exercise to a variety of relevant 
real world writing circumstances; and users' positive response thanks to 
the high face validity of writing samples. The major advantages 
associated with indirect assessment, on the other hand, are high score 
reliability, relatively low test scoring costs and a high degree of 
control over the nature of the skills tested. 

The potential disadvantages of direct assessment include the high 
cost of scoring and the potential lack of control over the skills tested 
(i.e., because every response is unique). The potential limitations of 
i-he indirect method are the lack of correspondence to real world writing 
tasks and! heavy reliance of the assesanent method on reading skills 
rather than writing proficiency. ^ 

There are numerous indirect measures of writing skill available. 
Since most tend to focus primarily on mechanics, the greatest choice is 
available in this area. 

There are fewer direct measures available, but extensive work is 
underway to develop mor:s, and to train people in the effective scoring of 
writing samples. A brief description of the basic approaches to scoring 
writing samples is given in Appendix D along with references to three 
excellent publications on the topic. 

Appendix E provides a comprehensive list of available published tests 
of writing skills. Of the 47 tests listed, only nine have a component 
which involves a direct measure of writing. 



38 



ERIC 4 J 



Martha: A Case Study in the Selection of Available 
Testa of Basic Literacy Skills * 

LITERACY SKILLS OF In January, Martha was hired as a community 

INTEREST college counselor in a suburban city in a 

Northwestern state. Her responsibility was to 
hel[ incoming students determine what courses 
they should take. Knowing that nearly all 
courses require reading and writing, she decided 
to make tests of basic literacy skills part of 
the entering assesanen w process. Martha was told 
that in the past reading skills had been measured 
with a norm referenced test. Scores were 
reported to students and teachers as grade 
equivalents. Writing skills had not been 
tested. To be 3ure that she was testing the most 
important literacy skills, Martha asked members 
of each department in the college to provide her 
with representative examples of the reading 
materials used in each course, and a list of 
required writing tasks. 

After analyzing that information (with 
assistance f ran a reeding and writing 
specialist), she decided to measure student 
reading skills in the areas of literal 
comprehension, inference and study skills. She 
also decided that whatever test she chose should 
-.pely on materials commonly found in school 
texts. She also decided to measure students' 
writing mechanics skills and their ability to 
write a short, well organized, and informative 
essay. m 
Although other skills were needed in some 
courses, Martha decided that it should be left to 
the instructors to measure those skills. She 
woulcj be available to help them find appropriate 
tests • 

PURPOSE FOR TESTING Martha's purpose for testing was to 

identify students who would benefit from tutorial 
assistance, as well as those who should enroll in 
a special reading or writing class. 



*The case studies in this and subsequent chapters reflect the important 
testing decisions outlined in Chapters II-IV. 



1 



39 



.00 



USES AND USERS OF 
TEST RESULTS 



EXAMINEE 
CHARACTERISTICS 



LOGISTICAL 
CONSIDERATIONS 



TEST SELECTION 
PROCESS 



Martha decided to report test scores to 
students e*nd their instructors, as well as use 
the results herself to counsel students on how 
their reading or writing skills might affect 
their performance in various classes. She 
decided a criterion referenced score would be the 
most useful. However, to interpret scores 
properly she would need information on the 
performance of students in typical classes 
throughout the college. 

In reviewing college records, Martha learned 
that generally about 20 percent of the students 
were of Span ish-Ameri can descent — and half of 
those were bilingual—that 10 percent were black, 
5 percent from other cultural minority groups 
(many recent Indochinese refugees with little or 
no English skilla), and the remainder were white. 

Students whose native language was not 
English could indicate upon registration that 
they spoke, read or wrote little or no English. 
Martha decided these students should not be 
tested using the approach she was planning, since 
their unfamiliar ity with the language would make 
the results useless. Instead, such students 
would be assessed informally by teachers in 
English as a Second Language (ESL) classes, using 
te^ts developed by ESL teachers. Such tests 
would b^ based on current instructional materials 
and information gained from seminars on the 
cultural characteristics of those for whom 
Qpglish was a second language. 

Martha had been told by her supervisor that 
she could test for no longer than two hours. 
Students would be tested in small groups 
throughout registration week and the first week 
of classes. They would sign up for the testing 
time most convenient for them. 

To simplify administration, Martha wanted 
one reading test which will cover all three 
reading skills categories: literal 
comprehension, inference, and study skills. 

The conmunity college had access to test 
scoring services so she could have the tests 
machine Scored. 

Since Martha had a limited budget and short 
timeline, she considered only existing tests. If 
she could not find what she needed, she would 
have to postpone measurement until the following 
year . 



40 




In beginning her search for the right 
instrument, Martha decided to consider tests 
designed for junior and senior high school 
students, since there were few criterion 
referenced tests specifically designed for 
adults. Also, she knew from talking to the 
previous counselor that many students who had 
difficulty in classes had skills comparable to 
those of the typical eighth or ninth grader. 

Knowing the kinds of reading and writing 
skills she wanted to measure, Martha went through 
the lists of tests in Appendices B and D. She 
looked up each test in the Mental Measurements 
Yearbook , Testa in Print , News on Tests , or the 
Consumer's Guide to Writing Assessment to learn 
as much as possible about each one. 

Martha found four tests which appeared to be 
likely candidates for measuring reading skills. 
These were (1) Individualised Criterion 
Referenced Testing! Reading (ICRTR) , 
(2) Mastery i An Evaluation Tool; Reading 
(SOBAR) , (3) Criterion-Referenced; Reading 
Tactics, and (4) Analysis of S kills (ASK): 
Reading , 

Af tpr reading the Consumer's Guide to 
Writing Assessment , Martha decided the best 
approach to measuring students' writing ability 
was to use a direct measure of writing (i.e., 
have students compose a writing sample following 
carefully designed instructions). Since she also 
wanted to measure writing mechanics, she decided 
to first review published tests which included 
writing samples a? well as multiple choice items 
focused on writing mechanics. 

Martha decided to review the Basic Skills 
Assessment Program: Writer's Skills Test , IPX 
Be iic Skills Tests— Secondary Level (Writing) , 
WRITE: Senior High and Writing Proficiency 
Program for possible use in measuring writing. 

Martha reviewed the reading and writing 
tests using the review procedure specified in 
Chapter II. Her ratings are shown in Figure 5. 
After weighing the pros and cons of each she 
decided to use Mastery: An Evaluation Tool: 
Reading (SOBAR) and the Basic Skills Assessment 
Program: Writer's Skills Test . She selected the 
SOBAR because it included a wide variety of study 
skills items in addition to appropriate reading 
items. It could also be customized to match her 
objectives and time limitations, and included an 
optional section on reading in content areas. 



41 



FIGURE 5 

TECHNICAL REVIEW OF BASIC READING SKILLS TESTS 
(AN EXAMPLE) 



Rate each test of interest using the rating system described in the 
instructions in Figure 2. 



Validity 

1. Do the test items measure the specific learning objectives or skills 
that need to-be assessed? 

List learning objectives or skills to be assessed in the left hand 
column of the chart below. Put the name of each test being reviewed 
in a numbered box at the top of one of the right hand columns. 
(Repeat on each page of the test review chart.) Under each test 
indicate the number of items measuring each objective. Review the 
test manual or the actual test to determine if each objective or 
skill is measured. The publisher may indicate in the manual what 
skill each test item measures. 12 not, examine each test item to 
determine what skill or objective it measures. 



Test Name 



1 / 1 

Learning objectives or / 
■kills to be assessed L CfiXR 


/• 




1. L'4em( OonnprpkiCrt^iork 








i3. 


2. Tn$e>re*nce, 


H obj 


3,0 .Wins 


w ob;- J 


(\ ofc»j. 


3. ^Wi^ S<«ll5 


none 


£1 obj.*- 


HObj-' 


litems. 


4. 










5. 










6. 











Por the remainder of the questions* assign points as described in the 
Instructions for Assigning Ratings in the Technical Review of Tests Form 
(Figure 2) • The number in parentheses after each question indicates the 
number of possible points. 



ERLC 



42 

53 



Figure 5 (continued) 



Test Name 





1 


/ 


t 


H 










1 

/ Prs>K 


Were the test items developed 
in a systematic and rigorous 
manner so that the content is 
adequate and bias is 
minimized? (6) 




9- 


hIR* 




Were any empirical procedures 
used for screening or selecting 
items co ensure tnat items arc 
measuring what they were 
designed to measure, are 
understandable, contain 
reasonable answers, and are 
free of ambiguous alternative 
answers *nd unnecessarily, 
complex language? (3) 


0 


\ 






Is the validating group 

population with which the test 
is to be used? (2) 


t 

\ 


o 






Are) any special validity 
studies reported or 
specifically referenced? (2) 


I 


0 


1\R 


NR. 



Usability 



le Are the test items suitable for 
adults with limited literacy 
skills? (2) 

2. Are instructions to the test 
administrator clear and 
complete? (2) 













i 




\ 



fto-f reported in tv\G^(\ *Js a>\&\\a$o\t \*> rW\eu;er. 

43 

54 ■ 



Figure 5 (continued) 



Test 





1 


12 

1 


/3 

/ 


/4 

/ 


1 ifA i nqf r lift" i nna t*o f- Ha 

examiiiee written in clear and 
understandable teres? (2) 


X 


\ 






is une west ^utuouwQU ux«u xj< 

(2) 


a 








examinees to record their 
responses? (2) 


-f 

\' 


\ 




\ 


6. Is the type of score reported 

uWvdi rot my siuuauionr 


2 


a 






7. Is the process of converting 
raw scores sxnpxe ana aoes it 
yield scores which are easily 
interpreted? (2) 


\ 




a 




for testing appropriate? (2) 






SL 


1 
1 

\ 


Reliability 










1. Is reported reliability for 
major subtests and/or total 
tp qt- ccor Aft auf fficientlv hi oh? 

(3) 










2. Are the scoring procedures 
clear and complete 9 thus 
ensuring reliable scores? (2) 




i 




\ 

I 



*T,™e \/*r\es acccc^r\^° nur*b«r object Ot$ 



FIGURE 6 

TECHNICAL REVIEW OF WRITING SKILLS TESTS 
(AN EXAMPLE) 



Rate each test of interest using the rating system described in the 
instructions in Figure 2. 

Validity 

1. Do the tebi itama measure the specific learning objectives or skills 
that need to be assessed? 

List learning objectives or skills to be assessed in the left hand 
column of the chart below. Put the name of each test being reviewed 
in a numbered box at the top oJ one of the right hand columns. 
(Repeat on each page of the test review chart.) Under each test 
indicate the number of Items measuring each objective. Review the 
test manual or the actual test to determine if each objective or 
skill is measured. The publisher may indicate in the manual what 
skill each test item measures. If not, examine each test item to 
determine what skill or objective it measures. 



Test Name 



Learning obJ«ctlv«« or /Wri'tvi /Jox fcs.c / K^IS 
.kill, to b. ass...* fa T ^ fa ^ ft* ItL 




a. 


4 






2.0j»nv^e \deas. \oyc<\i\u 








S 


3. ^ilflllifjfi'cn 


5 


n 


3 








•1 


4 








\o 


I Id 


5 


J • v -• 
6. rum Hxa lion 


°\ 




T 


36 


1. tether ohjcc Kues) 




io 




*1 



Pot the remainder of the questions, assign points as described in the. 
Instructions for Assigning Ratings in the Technical Review of Tests Form 
(Figure 2) . The number in parentheses after each question indicates the 
numb z of possible points. 



ERIC 



45 ^ . 

do 



Figure 6 (continued) 



Test Name 





1 






M 








(UNITS' 




2. Were the test items developed 
in a systematic and rigorous 
manner so that the content is 
adequate and bias is 
minimized? (6\ 


5 


3 


H 




3. Were any empirical procedures 
used for screening or selecting 
items bo ensure that items are 
measuring what they were 
designed to measurer are 
understandable $ contain 
reasonable answers, and are 
free of ambiguous alternative 
answers and unnecessarily 
complex language? (3) 


3 






- . 


4. Is the validating group 
representative of the 
population with which the tast 
is to be used? (2) 










5. Are any special validity 
studies reported or 
specifically referenced? (2) 


\ 


O 






Usability 










x« nre une ucau lbcius suitooie ror 
adults with limited literacy 
skills? (2) 




\ 






2. Are instructions to the test 
administrator clear and 
complete? (2) 


a. 


O 


3L 





ERIC 



46 £ - 



Figure 6 (continued) 



Test 









/ 3 , 


/ 4 ' 










l>jjpp 


3. Are instructions to the 

examinee written in clear and 
understandable terms? (2) 


X 


\ 


3L 




4. Is the test formatted clearly? 

1 (2) 


a 






\ 


5. Is there a simple way for 
examinees to record their 
responses? (2) 


31 


3k. ' 






f% Ta t*hA fvnp r\f ornrp FPDfir t#*<5 

useful for my situation? (2) 




\ 






/ • 19 WiM? £?£VJV#t3S>0 \JJL wWIlVCL tiiiy 

raw scores simple and does it 
yield Scores which are easily 
interpreted? (2) 


a 








8. Is the amount of time required 
for testing appropriate? (2) 










IIC X X BU W Jf 










1. Is reported reliability for 

maior a uh testa and/or total 

test scores sufficiently high?, 
(3) 


3 








2. Are the scoring procedures 
clear and complete, thus 
ensuring reliable scores? (2) 








3* 



* A/cf repor+ed m materials av^ >U]?W. fo r«?v>.pujefs. 

O 47 

ERIC 5o 



She selected the Writer's Skills Test 
because it had a relatively short administration 
time while maintaining a hijh level of 
reliability. She had the raoi.*- complete 
information on this test and ii closely matched 
her criteria. 

SETTING PERFORMANCE Martha decided to use the Borderline Method 

LEVELS (see p. 23) to set a level of performance that 

would indicate whether a student should have 
special reading or writing assistance. She asked 
teachers in all the freshman classes to identify 
students whoa they considered "borderline" (i.e., 
possibly inadequate) in their reading and writing 
skills. These students would be tested in thQ 
spring, and the results used to set a standard 
Martha could refer to in counseling future 
students on the advisability of receiving extra 
help. 

Once Martha receives the test scores, she 
will determine which students fall below the 
cutoff, she will meet with each student who 
scored below the cutoff to discuss possible 
problems. In many cases she may recommend 
special help in reading or writing. 

Martha will make the test results available 
to other students upon request, and will confer 
with either students or teachers regarding test 
results. 



INTERPRETING AND 
REPORTING RESULTS 



48 



Chapter VI 

Assessment of Everyday 
Literacy Activities 



ERLC 



CHAPTER VI 



ASSESSMENT OP EVERYDAY LITERACY ACTIVITIES 

1 

There is growing concern that n^any adults lack the literacy skills to 
perform everyday tasks— such as reading road signs, maps, recipes and 
simple instructions. Whether the problem is actually due to inadequate 
reading skills or other conceptual limits remains a matter of debate. In 
any case, there are increasing numbers of tests that measure everyday 
skills involving reading and writing, and instructors of adults are 
frequently interested in assessing their students 9 abilities to perform 
these everyday tasks. 

Not all such tests demand the same skills. For example, many 
everyday activities, such as filling out a job application, require that 
a person both read and write. Depending on what skills one wishes to 
measure, a test which requires the student tc actually fill out an 
application may be preferable to one in which the student simply selects 
the correct answer from among several alternatives. Or, if one does not 
want to test both reading and writing on the same test item, a multiple 
choice test may be preferable. 

Published tests of everyday activities best fulfill two testing, 
purposes: diagnosis of students* skills and survey assessment (see pp. 
4*5) • Summative and formative evaluation and oertif ication of minimal 
sk ills are also possible if the test is designed for a population very 
similar to t£at with which it will be used, and if the skills tested are 
clearly those being taught or those to be certified. Tests of everyday 
activities are not generally appropriate for use in selection. 



Available Tests of Everyday Literacy Activities 

Fourteen published tests of everyday literacy activities are listed 
in Appendix F. Tnis list provides a starting point for locating useful 
tests of this type. 

Nathan; A Case Study in the Selection of Available Tests 
of Everyday Literacy Activities 

LITERACY SKILLS Nathan was a reading specialist in the Spring 

OF INTEREST Park Community College Adult Basic Education 

program. One class there was designed 
specifically to help people acquire literacy 
skills essential for everyday life tasks (e.g., 
reading road signs, completing job applications, 
reading want ads, following simple directions) • 



50 



} 



PURPOSE FOR TESTING/ 
USES AND USERS OP 
TEST RESULTS 



EXAMINEE 
CHARACTERISTICS 



LOGISTICAL 
CONSIDERATIONS 



TEST SELECTION 
PROCESS 



Nathan wanted to test students upon 
registration to obtain a general idea of what 
tasks they had difficulty with. He will use the 
cesults in conjunction with other information to 
plan instructional activities for each student. 
Nathan also wanted to administer the test again 
at the end of the year as part of a suraraative 
evaluation to determine how well skills were 
attained. The results would be used solely by 
Nathan in planning course changes for next year. 

Nathan had taught this course before and was 
quite faniliar with the type of students who 
signed up for, the class. Many students were 
native English speakers who, for one reason or 
another, had not acquired many literacy skills 
during their previous schooling. They had 
problems with certain everyday tasks requiring 
literary skills. 

For another group of students, English was a 
second language. Seme of these students had 
lived in the United States for a considerable 
time and were reasonably familiar with the 
predominant culture. Others were recent 
immigrants (including many Indochinese refugees) 
who had taken ESL classes for about one year. 
They now wanted to focus on literacy skills in 
everyday life. 

Nathan wanted a test which would take no 
longer than one hour to administer. Since he 
expected to have less than 4p students, he 
planned to hand score the test. He decided to 
divide the students into two groups for 
administration. While one group was taking the 
test, proctored by an assistant, he would 
interview the others to obtain additional 
information helpful in planning curriculum. 

Nathan reviewed the list of tests presented 
in Appendix F. He found some of them described 
and reviewed in Tests of Adult Functional 
Literacy; A Review of Currently Available 
Instruments and further information on others in 
News on Tests . Based on this information, he 
identified the Adult Performance Level Functional 
Literacy Test , Senior High Assessment of Reading 
Performance (SHARP) , and Reading/Ever yday 
Activities in Life (R/EAL) tests as most' 
appropriate for his needs. Next, he obtained 
copies of the tests from the publishers. He 



ERLC 



51 



looked at each item carefully to see whether it 
related to what he usually taught in the course, 
and whether the item content seemed appropriate 
for his students. He completed the technical 
quality rating form as shown in Figure 7. 

None of the tests matched as well with his 
course content as Nathan would have liked. He 
felt that situations described in each test would 
be appropriate for his native English speaking 
students and most of the bilingual students who 
had been in the United States for several years. 
But the more recent immigrants, he suspected, 
would find some of the situations described in 
test items so unfamiliar that their responses 
would not reflect their true literacy skills. 
Nathan concluded that while a test would be 
useful, he would have to take these factors into 
account in using the results. 

Nathan decided to use the R/EAL test because 
it matched his course content better than the 
others. It aJ^so required students to write 
answers; the other tests used a multipia choice 
format. Nathan thought written answers would be 
helpful in diagnosing student problems. 

SETTING PERFORMANCE Once Nathan had selected the test, 

LEVELS he thought more about the variety of meanings 

literacy had' for his students. He realized even 
more than before that it would be inappropriate 
to use a particular performance level for making 
unilateral decisions. His students simply 
differed too much in background and experience. 

INTERPRETATION AND Nathan decided that the test would be very 

REPORTING OF RESULTS adequate for one of his pur poses— initial 

placement and instructional planning for 
individual students. However, he modified his 
original plan to use t-he test for evaluation. 
The test did not reflect either the curriculum or 
his students 1 characteristics accurately enough 
to justify use in summative evaluation. 

Nathan decided to meet with a test 
development specialist to find out what 
information to collect during the year so that 
they could develop a test suitable for course 
evaluation. 

Once tests were scored, Nathan planned to 
meet with each student to discuss the kinds of 
everyday casks on which the student had 
difficulty. This review would be only a first 
step toward learning more about what each student 
viewed as his/her difficulties, and what further 
learning was desired. 




r- 

FIGURE 7 

TECHNICAL REVIEW OF EVERYDAY LITERACY ACTIVITIES TESTS 

(AN EXAMPLE) 

Rate each test of interest using the rating system described in the 
instructions in Figure 2. 

Validity 

1. Do the test items measure the specific learning objectives or skills 
that need to be assessed? 

List learning objectives or skills to be assessed in the left hand 
column of the chart below. Pnt the name of each test being reviewed 
in a numbered box at the top of one of the right hand columns. 
(Repeat on each page of the test review chart.) Under each test 
indicate the number of items measuring each objective. Review the 
test manual or the actual test to determine if each objective or 
skill is measured. The publisher may indicate in the manual what 
skill each test item measures. If not, examine each test item to 
determine what skill or objective it measures. 



Test Name 



Learning objectives or 
skills to be assessed 


/' 

f flPL 


1 SHARP 




f 






H 


5 




1. Qoad S.<£jW<2> 


1 




5 








H 


3 




4. Telephone, 'forecAacO) 




1 






5. (V\a^X)'n"L Ac-Vici^S 




'i 


5 




6. Ad\Jerh< t'vnPni^ 






S 






1 




5 




3- Road /V\ap 


/ 


s 


5 




9. UJn/A-t- (Xd 


1 


4 


5 




\0. Job Afp\*ca-fioo 


I 


4 


5 








3S 







For the remainder of the questions assign points as described in the 
Instructions for Assigning Ratings in the Technical Review of Tests Form 
(Figure 2). The number in parentheses after each question indicates the 
number of possible points. 



ERIC 



Figure 7 (continued) 



Test Name 





1 




t 


/• 




PiPL. 








2. Were the test items developed 
in a systematic and rigorous 
manner so that the content is 
adequate and bias is 
minimized? (6) 


a. 




5 




3. Were any empirical procedures 
used for screening or selecting 
items to ensure that items are 
measuring what they were 
designed to measure # are 
understandable/ contain 
reasonable answers, and are 
free of ambiguous alternative 
answers and unnecessarily 
complex language? (3) 


- 




d- 


- 


4. Is the validating group 
representative oc cue 
population with which the test 
is to be used? (2) 




Al K 






5. Are any special validity 
studies reported or 
specifically referenced? (2) 











Usability 



1. Are the test items suitable for 
adults with limited literacy 
skills? (2) 



2. Are instructions to the test 
adLiinistrator clear and 
complete? (2} 



X Alo+ reported \n iwa+enalS auailabl^: \x> (eO\&\^pX> 

54 

65 





3~ 


1 




\ 


3- 


2- 





Figure 7 (continued) 



Test 



3. Are instructions to the 

examinee written in clear and 
understandable termf? (2) 



4. Is the test formatted clearly? 
(2) 



Is there a simple way for 
examinees to record their 
responses? (2) 



Is the type of score reported 
useful for my situation? (2) 



7. Is the process of converting 
raw scores simple and does it 
yield scores which are easily 
interpreted? (2) 



8. Is the amount of time required 
for testing appropriate? (2) 



1 

APL 










2- 














\ 


I 








a. 




-A 




3- 


3- 




1 


1 


1 





Reliability 



1. Is reported reliability for 
major subtests and/or total 
test scores sufficiently high? 
- (3) 



2. Are the scoring procedures 
clear and completer thus 
ensuring reliable scores? (2) 



1 


MR 






X 


2- 







ERIC 



55 

66 



Chapter VII - 

Assessment of On-the-job Literacy 



i 



CHAPTER VII 



ASSESSMENT OP ON-THE-JOB LITERACY 



While literacy skills alone rarely determine job success, many jobs 
denand skill in reading and writing. Thus, assessnent of such skills is 
useful in countless personnel decisions, including those involving staff 
development and job placement. 

When Mikulecky (1980) studied documentation of work related literacy 
competencies, however, he found that fewer and fewer businesses were 
using or making available tests designed for employees. The possible 
reasore are numerous. Schultz (1975) and Hunt and Lindley (1977) found, 
for instance, that pre-employment tests are often extremely difficult to 
read and not ccnmensurate with the reading levels of particular jobs. 

A primary concern whefi assessing job competency is the legal 
acceptability of tests. In Griggs vs. Duke Power Company (U.S. Supreme 
Court, 1971, 3 PEP Cases 175), the court ruled that tests used to measure 
job competency must be validated as job related. Showing this 
relationship can be a complex and time consuming task. >Job analysis is a 
critical first step in test selection or development. 

One of the most enlightening studies of on-the-job literacy was done 
by Sticht (1975)*. Por nearly a decade Sticht analyzed the role 6* 
literacy in selected military occupations. He concluded that there are 
two tundamental purposes for reading at work; reading-f or-learning and 
reading- for-doing. A reading-to-do task involves looking up information 
necessary to accomplish a certain, task — information which can then be 
forgotten. Following a recipe, locating parts in a supply catalog, 
completing an inventory checklist and studying a canputer reference 
manual are examples of reading- to-do. Reading-for-learning requires 
understanding and assimilating information for later use. 

Reading-to-do requires searching for needed information. This may 
involve using tables of contents and reference indexes. While 
memorization is not necessary, some may occur naturally through 
repetition and association. Knowledge of how and where to locate 
information efficiently must, of course, be acquired and retained. 

Reading-for-learning predominates in the school setting and in many 
job training programs, especially for positions above entry level. It is 
appropriate when procedures must be learned because reference docunents 
are not conveniently available (e.g., at a construction site) or because 
time tequiranents demand that complex tasks be completed quickly. 
Reading-for-learning also occurs when a job requires high jat'der mental 
operations, such as evaluation and information synthesis to solve complex 
problems. Por example, a law clerk may need to read several past court 
cases and evaluate and synthesize the information to determine 
implications for a current case. Or a computer programming supervisor 
may need to read a variety of articles to glean and synthesize 
information helpful in analyzing the reasons for certain recurrent 
programming problems. 



♦Other studies have been done by Thomas and Smotherman (1976) and Moe, 
Rush and St or lie (1979). 



57 

6j 



Stiggins (1981) analyzed the differences between reading-to-learn and 
reading-to-do in terms which have implications for testing. He states 
that reading-to-do requires identifying and constructing word meanings 
automrtically, taking advantage of explicit structure provided by the 
author to quickly assess needed information. The required vocabulary is 
defined by the specific job context. The memory demands are not high 
since the mater xal used is generally familiar, provided in an easily used 
form and readily available for reference. The information rea^ toes not 
have to be integrated with existing knowledge but does have t 
compared with existing knowledge to ensure accuracy. 

On the other hanu, Stiggins maintains th*t reading-to-lear n, though 
also requiring the reader to automatically identify or construct word 
meanings, differs from reading- to* do. Materials designed for 
reading-to-learn are usually pre? ^ed in narrative form with za 
implicit, not an explicit, structure. The reader must impose a summary 
structure on the material that facilitates integration of the material 
with existing knowledge. Effective learning is contingent on reading 
from the proper perspective- The memory demands are higher since the 
material learned is generally unfamiliar and must be compared with 
existing knowledge. In reading-to-learn, the reader relies predominantly 
on careful processing of written material, as opposed to rapid scanning. 
Information must be evaluated in terms of accuracy and appropriateness 
and assimilated into one's existing knowledge structure. In short, 
reading- to-learn requires more complex information processing. 

The United States Army is currently conducting task analyses of over 
120 basic army jobs to determine their literacy requirements (Mikulecky, 
1980). Competencies and patterns of competencies across jobs are being 
determined to facilitate development of training programs. This approach 
assumes that seemingly similar competencies can i*3 taught for several 
different jobs at the same time. Scribner and Colt s (1979) work 
suggests that this assumption may not be wholly accurate. What makes a 
reader competent may not be mastery of a few basic skills easily 
transferable to different settings, but rather, mastery of a wide variety 
of specific reading skills gained through numerous diverse experiences. 

Hoe, Rush and Storlie 11979) have investigated the literary 
requirements of ten semi-skilled and skilled owoations and the 
corresponding requirements necessary to succeed in training programs for 
each of those occupations. These job specific litae*:acy requirements are 
described to aid educators, counselors and administrators In providing 
services to adults who aspire to these occupations but have minimal 
literacy skills. Recommendations for instructional programs for each of 
the occupations are included. The following occupations were studied: 

Account Clerk 
Automotive Mechanic 
Dra*>saan 
Electrician 

Heating and Air Conditioning Mechanic 
Industrial Maintenance Mechanic 
Licensed Practical Nurse 
* Machine Tool Operator 

Secretary 

Welder i 



6, 



Available Tests of On-the-Job Literacy 

Available tests for on-the-job literacy are limited primarily to 
clerical positions (Mikulechy, 1980). Tests for other positions often 
measure factors other than literacy. Appendix G lists available clerical 
tests which may be useful in certain situations. These tests may be 
adequate for diagnosing students 1 skills, placing them in appropriate 
training programs and in seme cases, for certification. In 
certification, careful task analysis and establishment of appropriate 
criterion competence levels required for each job are critical. 

Phyllis; A Case Study in the Selection of Availab le 
Tests of 0 ft- 1 he-Job Literacy Skills 

LITERACY SKILLS JP Phyllis was a personnel officer for Aloha 

INTEREST Computer Company, a large computer assembly 

industry. She was responsible for ensuring that 
new employees had adequate skills to perform the 
tasks required in their respective positions. 
Supervisors had been particularly concerned 
recontly about the literacy skills of employees, 
and many had asked Phyllis to start routinely 
testing applicants' reading and writing skills. 

Phyllis was concerned about both the 
potential legal problems associated with such 
testing and the usability of the results. She 
had read how important: it was to be sure any 
reading and writing skills measured were clearly 
identified as those necessary for a given job. 

Phyllis decided to begin by identifying 
tests developed for specific jobs. After 
reviewing the organizational positions and 
talking with a local university measurement 
professor, she decided that the clerical 
positions were the only ones for wMch existing 
tests might be useful. 

Phyllis had job descriptions for all 
clerical positions on file. She also had samples 
of the types of letters and memos which clerical 
staff were asked to write, and samples of the 
manuals and other materials they had to read. 
For other areas she would iniUate a task 
analysis and be^in working with an assessment 
specialist to develop measures specifically 
designed for their situation. She thought it 
would tak€ at least a year before any of these 
measures were ready for use in selection. 



ERIC 



59 

7o 



PURPOSES OF TESTING/ Phyllis wanted to use test results as 

USES AND USERS OF part of the criterion for hiring. She also 

TEST RESULTS wanted to use results in determining current 

staff development needs. In addition to Phyllis, 
personnel selection and staff development 
committee members would have access to results. 



EXAMINEE 
CHARACTERISTICS 



Phyllis had worked at the Aloha computer 
Company for seven years and was quite familiar 
with the type of applicants Aloha usually 
received for clerical positions, and the 
characteristics of successful and unsuccessful 
employees in these positions. 

Most applicants were recent high school 
graduates, though lately more women who had been 
housewives for 10 to 20 years were applying. The 
community was about 30 percent Black, 5 percent 
Spanish and 5 percent Asian and other 
minorities. Phyllis wanted to be sure the test 
did not unfairly discriminate among applicants 
based on race. Nearly all applicants were 
accustomed to taking standardized tests. 



LOGISTICAL 
CONSIDERATIONS 



Each applicant would be given the test 
when they applied for a position. It was 
important that the test take no longer than 30 
minutes to administer. 



TEST SELECTION 



Phyllis reviewed- the list of tests in 
Appendix G. She called each publisher and asked 
for sample copiss. After reviewing several tests 
using the procedures given in Chapter II, she 
decided to use the Short Tests of Clerical 
Ability. 



SETTING PERFORMANCE 
LEVELS 



Phyllis planned to administer the test to a 
randomly selected sample of present employee?, 
then use the results in establishing a criterion 
level of performance. She planned to use the 
combination method (p. 27) for setting the 
performance standard. She would carefully select 
judges from various positions (supervisors, 
office managers, clerical staff) to participate 
in standard setting. She would also share 
results from this test with the staf£ development 
committee. 



INTERPRETING AND Phyllis decided to report results to the 

REPORTING RESULTS selection committee using a graph which showed 

the criterion score and each examinee's score 
(see Figure 8). She would emphasize the 
importance of interpreting scores in lig K t of 
other information obtained \ la interviews and 
references. 



60 

7i 



Figure 8 

Short Tests of Clerical Ability Results 
Applicant 0 10 20 30* 40 50 60 70 
Andrews, P. X 
Barclay, R. X 
Davis, J. X 
Levine, T. X 

Lopez, B. X 



The two-to-five-page report to the staff 
development coamittee would not give information 
on specific examinees* Rather it would discuss 
which skills appeared most deficient. Important 
concerns raised during the standard setting 
sessions would also be reported. 



'Criterion score 



61 



Summary 



This Guide has attempted to introduce the reader to the key issues in 
selecting existing tests to measure adulu literacy. Extensive research 
and development is still needed to provide educators with adequate adult 
literacy measures. 

The staff of the Functional Literacy Froject welcome your comments on 
the utility of this Guide. Suggested additions or changes to be made 
when the Guide is revised should be directed to 



Assessment and Measurement Program Director 
Northwest Regional Educational Laboratory 
300 S.W. Sixth Avenue 
Portland, Oregon 97204 



References 



Anderson, B. , R. Stiggins, & S. Hiscox. Guidelines for Selecting Basic 
Skills and Life Skills Tests , Portland, OR: Northwest Regional 
Educational Laboratory, 1980. 

Bormuth, J. Toward a literate society (Eds. J. B. Carroll S J. S. 
Challs) . New York: McGraw Hill Book Co., 1975. 

Bridgeford N., fc R. Stiggins. A Consigner's Guide to Writing 
Assessment . Portland, OR: Northwest Regional Educational Laboratory, 
1981. 

Buroe, 0. K. (Ed.) Tests in Print II . Highland Park, NJ: The Gryphon 
Press, 1974. 

Bur be, 0. K. (Ed.) The Eighth Mental Measurements Yearbook . Highland 
Park, NJ: The Gryphon Press, 1978. 

Hunt, T. , 6 C. Lindley. Documentation of selection and promotion test 
questions: Are your records sagging? Public Personnel Management , 
October-November, 1977, 6. 

Mikulecky, L. Literacy competencies and youth employment , a paper 
prepared for the National Institute of Education and the Department of 
Labor, 1980. 

Moe, A. , R. Rush, & R. Storlie. The literacy requirements of account 
clerk on the job and in a vocational training program (Project Report). 
West Lafayette, IN: Department of Education, Purdue University, 
November 1979. 

Moe, A., R. Rush, 6 R. Storlie. The literacy requirements of draftsman 
on the job and in a vocational training program (Project Report) . West 
Lafayette, IN: Department of Education, Purdue University, November 1979. 

Moe, A., R. Rush, 6 R. Storlie. The literacy requirements of electrician 
on the job and in a wcational training program (Project Report). West 
Lafayette, IN: Department of .Education, Purdue University, November 1979. 

Moe, A., R. Rush, & R. Storlie. The literacy requirements of secretary 
on the job and in a vocational training orogram (Project Report) . West 
Lafayette, IN: Department of Education, Purdue University, November 1979. 

Moe, A., R. Rush, 6 R. Storlie. The literacy requirements of heating and 
air conditioning mechanic on the job and in a vocational training program 
(Project Report) . West Lafayette, IN: Department of Education, Purdue 
University, November 1979. 




Moe, A., Rush, & R. Storlie. The literacy requirements of welder on 
the job aid in a vocational training program (Project Report) • West 
Lafayette, IN: Department of Education, Purdue University, November 19 7 9. 

Moe, A. , R. Rush, & R. Storlie. The literacy requirements of licensed 
practical nurse on the job and in a vocational training program (Project 
Report). West Lafayette, IN: Department of Education, Purdue 
University, November 1979. 

Nafziger, D., et.al. Tests of Functional Adult Literacy: An Evaluation 
of Currently Available Instruments , Portland, OR: Northwest Regional 
Educational Laboratory, 1975. 

Schultz, C. An unnecessary chore for employment applicants. Public 
Personnel Management , 1975, 4. 

Sticht, T. Reading for working: A functional literacy anthology . 
Alexandria, VA: Human Resources Research Organization, 1975. 

Stiggirs , R. An Analysis of the Dimensions and Testing of Job Related 
Reading . Portland, OR: Northwest Regional Educational Laboratory, 1981. 

Thomas, J. , & E. Smotherman. Catalog of performance objectives, 
criterion referenced measures and performance guides for carpenters. 
Vocational/Technical Consortium of the States, State Department of 
Education and Department of Vocational Education, University of Kentucky, 
1976. 



Appendices 



APPENDIX A 



SUGGESTED READINGS ON DEFINITIONS OP FUNCTIONAL LITERACY* 



Bormuth, J. R. Reading literacy: Its definition and assessment. 
Reading Research Quarterly , 1973-74, 9{1), 7-66. A revised and 
abridged version was subsequently published in J. B. Carroll & J. S. 
Chall (Eds.), Toward a literate society . New York: McGraw Hill Book 
Co., 1975. 

This monograph presents a thorough discussion of the concept of 
literacy, oriented towards assessment and measurement issues. The 
analysis of literacy focuses on identifying the parameters which must 
be specified in any definition of literacy. Corresponding 
measurement issues and available literacy assessment procedures are 
described. The author is refreshingly aware that literacy cannot be 
parameterized only in terms of certain characteristics of the people 
being treated— performance depends critically on the characteristics 
of the writ Lei materials and the reading tasks. 



Harman, D. Illiteracy: An overview. Harvard Educational Review , 197C, 
40(2), 226-243. 1 

The author reviews current definitions of illiteracy and 
functional literacy and discusses their relationship to estimates of 
the extent of illiteracy and to literacy education. Although the 
article is somewhat dated, Harman' s anticipation of the need to 
measure literacy in relation to 'the functional requisites of 
particular societies is noteworthy. He argues that adult basic 
education efforts here and abroad should be planned on a situation- 
specific basis, with goals, content and evaluative components derived 
independently of the usual grade school equivalencies. 

Hunter, C, with D. Harman. Adult literacy in the United States . New 
York: McGraw Hill Bock Co., 1979. » 

Much of the material in this volume updates and elaborates on 
the earlier Harman (1970) article. Here we consider just the first 
two chapters. In Chapter I of this volume, Hunter and Harman 
differentiate two types of literacy: conventional and functional 
literacy. Conventional literacy is defined as the "ability to read, 
write and comprehend texts on familiar subjects and to understand 



♦For reference on other aspects of functional literacy, see Reder, S. , 
M. F. Walton, and K. R. Green. A bibliographic guide to functional 
literacy . Portland, OR: Northwest Regional Educational Laboratory, 
1979. (ED 189 197) 




66 



whatever signs, labels, instructions and directions are necessary to 
get along in one's environment ." Functional literacy is "the 
possession of skills perceived as necessary by particular persons and 
groups to fulfill their own self-determined objectives as family and 
community members, citizens, job-holders..," This definition 
includes the ability to gain access to information they may want to 
use. • 

In Chapter II, the authors examine demographic data in an etfort 
to delineate the illiterate population of the United States, For 
this purpose they take high school graduation as the criterion of 
literacy. Although this is in many ways an arbitrary and 
inappropriate measure, there is abundant data on high school 
completion. They then characterize the illiterate American 
population in terms of several social and economic variables. They 
indicate that although the exact nunber of illiterates is not known, 
and although more individuals are completing high school and 
achieving "acceptable" levels of literacy, well over one- third of the 
adult population suffers some educational disadvantage. Hunter and 
Barman stress that 

Kirsch, I., * J. Guthrie. The concept and measurement of functional 
literacy. Reading Research Quarterly , 1977-78 13(4), 485-507. 

This article reviews the literature on the concept and 
measurement of functional literacy. The authors differentiate the 
various meanings applied to the term functional literacy and their 
implications for its definition and measurement. Various measures 
used in past surveys to assess functional literacy are critically 
examined and specific limitations of each approach are discussed. 

Peck, C, & M. Kling. Adult literacy in the Seventies: Its definition 
and measurement. Journal of Reading , 1977, 20(8), 677-682. 

After summarizing definitions and estimates of functioned 
literacy/ illiteracy that emerged frcm several major studies, this 
article critically examines several instruments developed to assess 
functional literacy levels. Special attention is given to che 
assessment of real-life reading skills. The article concludes by 
noting that any definition and assessment is relevant only to a given 
subpopulation rather than the United States as a whole. 

Smith, L. L. Literacy: Definitions and implications. Language Arts , 
1977, 54(2) , 135-138. 

Although somewhat uncritically presented, this articles provides 
a lay-oriented summary of the literature on issues involved in 
defining functional literacy. Based on identification of these 
issues, the author outlines the elements needed in any definition of 
functional literacy. 



67 * O 



APPENDIX B 



SUMMARY OF COMMON TEST SCORES 



7 

68 ' ' 



FRir 



t 



SCORES FREQUENTLY ASSOCIATEO WITH NORM REFERENCEO TESTS 



9 

ERLC 



OEF1N1TIOH 



MAJOR ADVANTAGES 



MAJOR DISADVANTAGES 



The percentile rank establishes 
a student's standing relative to 
a norm group In terms of the per- 
centage of students who scored at 
or below his or her raw score. 
For example, a student who scored 
at the 98th percentile achieved 
a raw score which was higher than 
the raw scores of 98 percent of 
the norm group who took the same 
test under the same conditions. 



Percentiles show the relative standing 
of Individuals compared to a normative 
group. 

They are familiar to most public school 
personnel , though probably not the 
general public. 

Percentiles are relatively easily 
explained. 



1. Percentiles are frequently confused 
with the percent of the total number 
of test Items answered correctly. 

2. Since the percentile scale does not 
have equal units of measurement, per- 
centiles should not be used In the 
computation of group statistics. 



The grade equivalent score Indi- 
cates the performance of a student 
on a particular test relative to 
the median performance of students 
at a given grade level and month; 
e.g.* a fifth grader who receives 
a grade equivalent score of 8.2 on 
a reading test achieved the same 
raw score performance as the typi- 
cal eighth grader In the second 
month of eighth grade would be 
expected to achieve on the same 
f i fth g rade te st. 



It appears easy to comminlcate the 
standing of an Individual student rela- 
tive to a grade >vel (most people 
believe they understand what Is meant 
by grade equivalent scores). 



1. Grade equivalents are easily misunder- 
stood and misinterpreted. 

2. Achievement expressed in grade equi- 
valent score units cannot be meaning- 
fully compared with each other in 
several Instances. 

a. Grade equivalent scores cannot be 
meaningfully compared for the same 
Student (or group of students) over 
time. 

b. Grade equivalent scores cannot be 
meaningfully compared for the same 
student (or group of students) across 
subject matter areas. 

c. Grade equivalent scores cannot be 
meaningfully compared for the same 
student (or group of students) across 
different tests. 

3. Many grade equivalent scores are statistical 
projections (Interpolations or extrapolations). 
In the later grades it 1$ not uncommon to find 
grade equivalent scores of two or three grade 
levels above or below the student's actual 
grade level, but these scores are of doubtful 
accuracy. 

4. The grade equivalent scale is not composed of 
equal sized units. Having equal sized units 
Implies that the underlying difference between 
any two scores Is the same throughout the scale. 



Gi 



SCORES FREQUENTLY ASSOCIATED WITH HORM REFEREf JO TESTS 



Standard scores are derived from 
raw scores, butexpress the rest ' ^s 
of a test on the same numerical 
scaK regardless of grade level, 
subject area or test employed. 



_ J 

-J <X 
ix _» 



DEFINITION 



MAJOR ADVANTAGES 



MAJOR DISADVANTAGES 



Since the mean and standard deviation 1. 
of the standard score scales are pre- 
speclfled, a student's standard score 
Immediately communicates two Important 
facts about his or her performance on 
that test: 

a. Whether the student's score Is 
above or below the mean. 

b. How far above or below the mean, 
In standard deviation units, his 
or her performance Is. 

The constant numerlcc! scale of standard 2< 
scores facilitates comparisons: 

a. Across students taking the same 
test. 

b. Across subject matter areas for the 
same student. 

Standard scores are derived In a way 3. 
that maintains the equal Interval pro- 
perty In their units which Is absent 
In percentile and grade equivalent scores. 
Therefore, summary statistics may be 
meaningfully Interpreted when calculated 
on standard scores* 



Ihe most useful Interpretation of standard 
scores requires some knowledge of statistics 
(I.e., mean and standard deviation) and 
hence may not be appropriate for audiences 
who have not been exposed to these concepts 
(e.g., oarents, the news media). 



Given the variety of standard scores available, 
there may be potential confusion- in expressing 
the same test performance with so many different 
numerical values. 



The conversion of raw scares to standard scores 
may either maintain the shape of the distribution 
observed, or may transform the distribution to 
another, more Interpretl vely convenient shape 
(e.g., the normal distribution); and the pro- 
cedures employed In specifying the convcrsl' \ 
process may not be Immediately obvious. 



A standard "ore system having 99 1. Same ai standard score systems, 
equal intervd.s. The average corres- v 

ponds to the 50th centile; the 1st b t. ?e.wit aggregation of data from a wide 
99th NCEs correspond to the 1st & 99th variety of tests, 
centiles. Range: generally 1-99 
but an be higher and lower. 



1. They are relatively new. 

2. They depend upon standard scores or 
percentiles. 

3. Not all test publishers use them. 



8 



SCORES FREQUENTLY ASSOCIATED WITH NORM REFERENCED TESTS 



DEFINITION 



MAJOR ADVANTAGES 



MAJOR DISADVANTAGES 



Expanded scale scores are a type of 
standard score whose scale Is 
designed to Extend across grade 
levels and whose wan increases 
progressively as the grade level 
increases. 



St/mines are a standard score scale 
consisting of nine values with a 
mean of five and a standard devia- 
tion of two. 

If the distribution of scores Is 
normal, ear.h stanine includes a 
known proportion of the scores 
in the distribution. 



1. Expanded scores facilitate longitudinal 
comparisons of an Individual across 
grade levels. 



2. Expanded scale scores provide the vehicle 
for expressing a performance obtained at 
one grade level to the norm group of 
another. This Is useful when the appro- 
priate level of a test to be administered 
to a student Is judged to be other then 
that of his or her grade level (i.e., 
functional level testing). 

3. Since they were designed as equal 
Interval, their scores may be mathemati- 
cally manipulated (e.g., averaged). 



1. As In all standard scores, stanlnes have 
the same meaning across different t<- ts, 
different grade levels and different 
content areas. 

2. Stanlnes consist of only nine possible 
scores and thus may be easier to commun- 
icate to audiences not familiar with 
measurement terminology. Verbal labels 
may be given to each stanine value to 
facilitate Interpretation. 



1. Different test publishers use different 
terms to refer to their expanded scale 
scores (e.g., qrowth scale values, 
achievement development scale scores, 
standard score, scale score) and this 
may be confusing when considering results 

-from different tests. 

2. Different tests use different rangrs, 
and standarJ deviations In deriving 
their expanded scale scores. Thu , 
results from different tests expressrd 
In expanded scale score units cannot be 
readily compared. 



3. The statistical properties of expanded 
scale scores are often not as uniform 
as theoretically desired. 



Since some of tho stanlnes encompass 
a wide range of score* , their use in 
reporting can be insensitive to differ- 
ences between students' performance 
that are more apparent from the use of 
other test scores. 



SCORES FREQUENTLY ASSOCIAfED WITH OBJECTIVE RtFERENCEO TESTS 



OFFJNJTfOH 

The nuwbpr r.f items on d test or 
subtest answered correctly by the 
student. 



HAJOR "ADVANTAGES 



MAJOR DISADVANTAGES 



1. Virtually no statistical or measure- 
ment expertise is needtl to calculate 
raw scopes. 

2. Raw scores are the necessary first step 
in expressing test performance in any of 
a number of other ways (e.g. standard 
scores, percentiles.) . 



1. 



The proportion of the total number 
of items answered correctly by the 
student. 



1. Very little statistical or measurement 1 
expertise Is required to understand this 
expression of test performance. 

2. If the content area is sufficiently 
represented by the items on the test, 

the percent correct provides an express .'jn 
ot the proportion of the subject matter 
mas te red[ j^y t he student. . 



By themselves, raw scores offer no indication 
as to how a student who has mastered the skills 
represented on the test "should" perform 
(i.e., criterion referenced) or how other 
students at the same grade level have performed 
(i.e., norm referenced.) 



No notion of test difficulty or expected 
performance is contdined »n this score. 
Unless accompanied by a standard for mastery 
or information as to how a student's peers have 
performed m the test, misinterpretations may 
arise. 



When a standard for mastery has been 
applied to a set of items for a spec 
tic objective, a student's performanc 
in terms of that objective is express 
as having mastery or non-mastery of 
the objective. 



1 



1. The objective mastery score compares the 
student's performance on that objective 
e to a judged standard of what he or she 
ed should know of the skills required to master 
it. This score can be very useful in 
diagnosing a student's specific strengths 
and weaknesses. 



When the subject natter requires a 
successive accumulation of skills (e.g., 
elementary math), objective mastery 
scores may be exti^.nely useful in 
monitoring the progress of students in 
specific skill areas. 



2. 



Objective mastery scores are difficult to 
compare across different tests, Items designed 
to measure the same objective may differ in 
difficulty or have different standaids for 
mastery on different tests. 



If a purpose in testing is to differentiate 
among students, objective mastery scores do 
not present a very useful index. Different 
r«w scores above or below the masteiy lev^l 
are viewed as the s?me~- ei ttier mastery or 
non-mastery. 



8/ 



APPENDIX C 



PUBLISHED TESTS OF BASIC READING SKILLS 



Most reading tests are part of a raulti subject battery. The first 
list in this appendix provides information on such tests. The second 
list gives information on tests which measure only reading skills. These 
lists are rood if i€ . versions of the ones in Anderson, B. L. , R. J. 
Stiggins, and S. B. Hisoox, Guidelines for Selecting Basic Skills and 
Life Skills Tests , Portland, OR: Northwest Regional Educational 
Laboratory, 1980. 

Although the lists are intended to include all appropriate tests 
published since 1970, sane may have been inadvertently overlooked. 
Inclusion on the list does not imply endorsement. 



Appendix C (continued) 



MULTI SUBJECT ACHIEVEMENT BATTERIES 



Grade 
Level (s)* 



Publi- 
cation 
Date 



Publi- 
sher** 



33 ts and Subscores 



< dult Basic Learning Examination (ABLE) 
Reading 
Arithmetic 
Spelling 
Vocabulary 



Adult 



1974 



PSYCE 
CORP 



Adult Performance Level Survey (APLS) 



9-Adult 



1976 



ACT 



Reading 

Computation 

Writing 

Identifying Facts and Terms 
Problem Solving 

Alaska Instructional Diagnostic System (AIDS) 1-8 1977 SRRC 

Reading 
Mathematics 

American School Achievement Tests, Revised 1-9 1975 BMC 

Edition (ASAT) 

Reading 

Arithmetic 

Language 

Spelling 

Social Studies 

Science 

California Achievement Tests Forms C & D (CAT) K-12 1977 CTB 

Readinq 
Mathematics 
Language 
Spelling 

Reference Skills 



*Not all s obtests are available at c?ll grade levels. 
**Names, addresses and phone numbers are given in Appendix H* 



74 g 



J 



Appendix C (continued) 



MULTISUBJECT ACHIEVEMENT BATTERIES 



Grade Publi- Publi- 

Le* y el (s) cation sher 
Tests and Subscores Date 



California Assessment Program Survey of Basic 12 1974 CSOOE 

Skills 

Reading 

Mathematics 

Written Expression 

Spelling 

Comprehensive Assessment Program PreK-12 1980 SFC 

The Achievement Series 

Reading 

Mathematics 

Language Arts 

Comprehensive Tests of Basic Skills Expanded K-12 1976 CTB 

Edition Form S 6 T (CTBS) 

Reading 

Mathematics 

Language Arts 

Reference Skills 

Science 

Social Studies 



Criterion Test of Basic Skills 
reading 
Arithmetic 



Kinder- 
gar ten-8 



1976 



ATP 



Diagnostic Skills Battery 
Reading 
Mathematics 
Language Arts 



1-8 



1976 



STS 



ERLC 



Iowa Tests of Basic Skills Multi-level 3-9 
Edition Forms 7 & 8 

Reading Comprehension 

Mathematics Skills 

Language Skills 

Work-Study Skills 

Vocabulary 

Iowa Tests of Educational Development: SRA 9-12 
Assessment Survey 

Reading 

Mathematics 

Language Arts 

Social Studies 

Science 

75 hu 



1978 



RPC 



1974 



SRA 



Appendix C (continued) 



MULTISUBJBCT ACHIEVEMENT fc*TTERIES 



Level ,3) 



Tests and Subscores 



Publi- 
cation 
Date 



Publi- 



ERLC 



Metropolitan Achievement Tests (METRO '78) 
Reading Comprehension 
Mathematics 
Language 
Social Studies 
Science 

National Educational Development Tests 
Mathematics Usage 
English Usage 
Social Studies Reading 
Nature Sciences Reading 
Word Usage 

Scholastic Testing Service Educational 
Development Series Scholastic Tests 

Reading 

Mathematics 

English 

Social Studies 
Science 

Solving Everyday Problems 
USA in the World 
Nonverbal Ability 
Verbal Ability 
School Interests 
School Plans 
Career Plans 

Science Research Associates Achievement 
Series (ACH) Forms 1 & 2 

Reading 

Mathematics 

Language Arts 

Social Studies 

Science 

Reference Materials 
Applied SkLMs 

Science Research Associates High School 
Placement 
Reading 

Arithmetic or Modern Math 
Language Arts 
Social Studies 
Science 

76 



K-12 



1978 



PSYCH 
CORP 



7-10 



1974 



SRA 



2-12 



1976 



STS 



K-12 



1978 



SRA 



1973 



SRA 



•9< 



Appendix C (continued) 



MULTISUBJECT ACHIEVEMENT BATTERIES 



Tests and Subscores 



Grade 
Level (s) 



Publi- 
cation 
Date 



Publi- 
sher 



Science Research Associates Norm Referenced/ 4*10 1977 
Criterion Referenced Testing Program 

Reading 

Mathematics 

Sequential Tests of Educational Progress: 1-12 1979 

Series III Levels E-J 
Reading 

Mathematics Computation 
Mathematics Basic Concepts 
Writing Skills 
Social Studies 
Science 

Study Skills/Listening 
Goal Orientation Index 

SOI Learning Abilities Test 1-11 1975 

Reading x 
Arithmetic 

Stanford Achievement Test 1973 Edition (SAT) 1-9 1973 
Reading 
Mathematics 
Social Studies 
Science 

Listening Comprehension 
Spelling 

Stanford Test of Academic Skills (TASK) 8-Adult 1975 

Reading 
Mathematics 
English 

Tests of Adult Basic Education (TABE) Adult 1976 

Reading 
Mathematics 
Language Arts 



SRA 



AWPC 



SOI 



PSYCH 
CORP 



PSYCH 
CORP 



CTB 



ERLC 



77 



Appendix C (continued) 
MULTISUBJECT ACHIEVEMENT BATTERIES 



Tests and Subscores 



Grade 

Level (s) 



Publi- 
cation 
Date 



Publi- 
sher 



Tests of Achievement and Proficiency Form T 



9-12 



1978 



RS 



(TAP) 

Reading Comprehension 
Mathematics 
Written Expression 
Social Studies 
Science 

Using Sources of Information 
Applied Proficiency Skills 

United States Qnployment Service Basic Adult 1974 USDL 

Occupational Literacy Test (USES BOLT) 

Reading Vocabulary 

Reading Comprehension 

Arithmetic Computation 

Arithmetic Reasoning 

Wide Range Achievement Test Revised Edition K-Adult 1978 JA 



(WRAT) 



Reading 

Arithmetic 

Spelling 



ERLC 



78 

9j 



Appendix C (continued) 

9 

READING TESTS 





Gr 


ade 


Publi- 


FUDll- 




Level (s) 


cation 


sher 


Tests 






Date 




Analysis of Skills: Reading 


1- 


■8 


1976 


STS 


Analytical Reading Inventory 


2- 


■9 


1977 


CEMPC 


Clarke Reading Self -Assessment Inventory 


11-Adult 


1978 


ATP 


Criterion Referenced: Reading Tactics 


7- 


■12 


1977 


SPC 


Cutrona Reading Inventory (Oral) 


K- 


■Adult 


1975 


CEI 


Developmental Reading: Diagnostic/ 


3- 


Adult 


1975 


PSAA 


Prescriptive Tests: Fundamental Stage 










Diagnostic Reading Scales 


1- 




1975 


CTB 


Diagnostic Reading Test: Pupil Progress Series 


1- 


■8 


1970 


STS 


Diagnostic Screening Test: Reading 


1- 


■12 


1979 


SC 


Fountain Valley Teacher Support System in 


7- 


■12 


1976 


RLZA 


Secondary Reading 










Gates-McGinitie Reading Tests 9 Second Edition 


1- 


■12 


1978 


RPC 


Gates-MoGinitie Reading Tests, First Edition 


1- 


■12 


1970 


RPC 


Individualized Criterion Referenced Testing: 


K- 


■8 


1976 


EDC 


Reading (ICRTR) 










Literacy Assessment Battery 


Adult 


1976 


ERIC 


Mastery: An Evaluation Tool: (SOBAR) 


K- 


■9 


1976 


SRA 


McCarthy Inai victual! zee Diagnostic Reading 


2- 


■Adult 




BPS 


Inventory — Revised 










McGratn Diagnostic Reading Test 


1- 


■13 


1976 


MPC 


N in inum Reading Competency Test 


1- 


■13 - 


1976 


KnSU 


Nelson-Denny Reading Test Forms C & D 


9-12 


1976 


RPC 


Nelson Reading Skills 


3- 


■9 


1977 


RPC 


Obiectives-Referenced Bank of Items and Tests: 


K- 


-Adult 


1975 


CTB 


Reading and Communication Skills (ORBIT: RC3) 










Oral Word Recognition Test 


1- 


-13 


1973 


MPC 


Performance Assessment in Reading (PAIR) 


7- 


-9 


1978 


CTB 


Power Reading Survey Test 


1- 


-12 


1975 


BFA 


Prescriptive Reading Performance Test 


K- 


-12 


1978 


WPS 


Reading Skills Diagnostic Test 


2- 


-8 


1971 


BP 


Reading Skills Competency Test 


K- 


-7 


1979 


CARE 


SRA Reading Index 


9- 


-Adult 


1974 


SRA 


Stanford Diagnostic Reading Test 


1- 


-13 


1976 


PSYCH 










CORP 


Test of Reading Comprehension (TORC) 


1- 


-8 


1978 


PRC-ED 


Woodcock Reading Mastery Tests (WRMT) 




-12 


1973 


AGS 



79 

Q i 



APPENDIX 0 



PROCEDURES FOR SCORING WRITING SAMPLES * 

While rainy objective testis can be machine scored, writing tests that 
rely on writing samples must be hand scored by persons trained to use 
designated criteria and performance standards. Several different methods 
have been devised for scoring writing samples, depending ori the 
assessment purpose. Appropriateness depends upon what information is 
needed, how it will La used, and what resources are available. Sane 
scoring methods are more complicated and costly than others. 

Three scoring methods will be discussed here: holistic, analytic and 
prinrar-y^trait scoring. The methods are similar in that all require 
careful twining of scorers (usually language arts teachers) in a group 
setting for about half a day* Once scorers are trained, scoring is 
usually conducted in a carefully structured setting to allow reading of 
papers by two readers (and a resolver , if necessary) and exercising of 
quality control procedures. Under such conditions high reliability can 
be achieved* 

Holistic Scoring 

Holistic scoring involves reading a paper for an overall or "whole" 
impression. No specific trait, such as organization, syntax or 
originality, is individually addressed. A reader makes a judgment in 
much the same way that he or she decides whether a novel or an essay in 
Newsweek magazine is superior, mediocre, or slipshod. 

Raters are asked to make no mark** upon the paper. The objective is 
to score quickly, trusting first impressions. 

Raters use "range finders" as guides. These are actual student 
papers, chosen for their representativeness, from the total group of 
papers to be scored. There is a range finder for each score level: 
e.g., 4, 3, 2 and 1. Virtually all trained readers agree on the score 
each paper should receive, so these papers serve as effective models to 
assist raters in assigning scores* 

Range finder papers are intended to represent the approximate 
midpoint of each range, for example, a 4 range finder should not be 
considered the best possible paper from the sample. Rather, it should be 
considered a "middle 4"; that is, some 4's in the sample will be a little 
better, same not quite so good. 

Range findiers are intended as guides . Clearly, not every contingency 
cen be anticipated. Readers are encouraged to (1) trust their own 
judgment, and (2) confer with a head reader to resolve any unusual 
problems* 

One holistic score is determined to represent the overall quality of 
the writing* Although the specific strengths and weaknesses of a given 
paper are not delineated, £ 4 nal reports to par ants and others can 
include descriptions of the typiceu. characteristics of papers receiving 
each score* 



80 



Appendix D (continued) 



Analytical Scoring 

Analytical scoring is a trait-by-trait analysis of a paper's merit. 
Individual traits considered important to an^ piece of writing in any 
context are selected for analysis. For example, papers of students asked 
to write a letter expressing an opinion may be scored on ideas , 
or ganization and wording . 

The traits are scored one at a time. The scorer' s impression of one 
trait should not influence the scoring of any other trait. Readers are 
presented with guidelines for scoring each trait. The guide presents an 
elaboration of each point on the rating scale for each trait. Scores are 
reported separately for each trait. Range finders such as those used in 
holistic scoring are sometimes used along with the scoring guides. 



Primary Trait Scoring 

Primary trait scoring is similar to analytical scoring in that it 
focuses on one or more specific characteristics of a given piece of 
writing. But while analytical scoring attempts to isolate general 
important characteristics, primary trait analysis is situation specific. 
That is, the most important — or primary — trait (s) in a letter to the 
editor will not likely be the same as that (those) in a set of directions 
for assembling a toy. 

The primary trait system is based on the premise that all writing is 
done in terms of an audience, and that successful writing will have the 
desired effect upon that audience, whether newspaper reader or employer. 
In scoring, papers are judged on the likelihood of their producing the 
desired response. 

Because they are situation-specific, primary traits differ from item 
to item, depending on the nature of the assignment. Suppose a student 
were asked to give directions for taking the bus from home to school. 
The primary trait might then be sequential organization; any cleat * 
unambiguous set of directions would necessarily be well organized. The 
chart on the next page summarizes the key features of the scoring 
procedures discussed above. 



o 9 



ERLC 



Appendix D (continued) 



Comparison of Alternatives 
/^Holistic 



GOAL 



PREPARATION 



SCORING TIKE 



CONTEXT 



Overall 
Impression 

Allow 1/2 day 
tp find range 
finders; 1/2 
day to train 
readers 



1 to 2 min. per 
paper depending 
on length 

Contexts where 
rank order is 
useful 



Analytical 

Trait-*by- trait 
Analysis 

1 day to i.d. 
traits; 1 day per 
trait to dev. 
criteria; 1 day 
to refine scoring; 
1/2 day to train 
readers 

1 to 2 min. per 
trait per paper 



Contexts where 
skill analysis 
is useful 



Primary Trait 

Situation 
Specific Analysis 

Same as analytical 



1 to 2 min. per 
trait per paper 



Contexts where 
skill analysis 
is useful 



The following references provide further information on direct measures of 
writing. 



Spandei, V., & R. Stiggins. Direct Measures of Writing Skill; Issues, 
and Applications . Portland, OR: Northwest Regional Educational 
Laboratory, 1980. ($4.25 prepaid) 

This 64-page monograph presents a status of writing assessment, 
an overview of direct writing assessment procedures and information 
on how to adapt writing assessment to specific purposes. 

Bridgeford, N. # 6 R. Stiggins. A Consumer's Guide to Writing 
Assessment . Portland, OR: Northwest Regional Educational 
Laboratory, 1981. 



The Guide compares direct and indirect assessment methods, 
describes how to develop direct assessment measures and select 
indirect measures. Extensive lists are provided of organization and 
consultants acroso the country who can provide assistance in 
developing direct measures. Profiles of indirect measures are also 
provided. 



82 27 



XPPENDIX B 



PUBLISHED TESTS OF WRITING SKILLS 



The test list is taken from Bridgetord, N., U. Stiggins. A 
Consumer's Guide to Writing Assessment , Portland , OR: Northwest Regional 
Educational Laboratory , 1981. Although the list is intended to include 
all tests published since 1970 and designed for grade 8 or above , some 
tests may have been inadvertently overlooked. Inclusion on the list does 
not imply endorsement. 



83 



Appendix B (continued) 
PUBLISHED TESTS OF WRITING SKILLS 



Grade 
Level (s) 



Test 



Publi- 
cation 
Date 



Publi- 
sher 



ACT Assessment English Usage Test 9-13 
(subtest of an achievement battery) 

Punctuation, grammar, style, 

diction, logic, organization 

Adult Performance Level* Adults 
Adult Survey — Writing Subscore 
(score based on items embedded 
in a longer test) 

Ability to recognize appropriately 

written material on various forms and 

documents used in everyday life 

Adult Performance Level 9-12 
High School Survey-Writing 
Subscore (score based on items 
embedded in a longer test) 

Ability to recognize appropriately 

written material on various forms and 

documents used in everyday life 

American School Achievement Tests: 7-9 
Language, Spelling (subtests of an 
achievement battery) 

Correct usage, punctuation, 

capitalization, sentence 

recognition, grammar, spelling 

APL Content Area Measures (series of 9-12 

individual tests, each including a Adults 

writing subscale) tests available in 

Occupational Knowledge, Community 

Resources, Consumer Economics, Health 

and Governmental Law 

Ability to recognize appropriately 
written materials on forms and documents 
used in everyday life 



5/year ACTP 



1976 



ACTP 



1976 



ACTP 



1963 



UMT 1 



1977 



ACTP 



♦Writing sample included 



84 



9J 



Appendix 3 (continued) 



Grade Publi- Publi- 

Level (s) cation sher 
Test Date 



Analysis of Skills (ASK) 2-8 
Language Arts 

Capitalization and punctuation, usage, 

sentence knowledge, composing process 

Basic Skills Assessment Program* 7-12 
Writer's Skills Test 

Spelling/punctuation, capitalization, 

usage, logic, evaluation 

California Achievement Tests/Language . 1-12 

(Subtest on an achievement battery) 

Language : auding, mechanics, usage and 

structure, spelling 

Language Mechanics : capitalization and 
punctuation 

Language Expression : usage, sentence 
structure, and paragraph organization 

California Achievement Tests/Language K-12 
(Subtest of an achievement battery) 
Mechanics, expression, spelling 

College English Placement Test* College 
Topic selection, organizing materials 
for presentation, editing, composition 

College-level Examination Program (CLEP)* College 
General Examination in English 
Composition and Subject Examinations in 
College Composition and Freshman English 

English Composition : Logical and 

structural relationship within sentences; 

economy, precision and clarity of 

conmunie tion; logical and attention 

to pur pi ie and audience 

College Composition : Sentence structure, 

paragraph and essay construction, style, 

logic, language history and reference 

skills 

Freshman English : Style, logic, syntax, 
usage, punctuation, paragraph 
construction, dictionary and research 
skills 



1977 



STS 



1977 



AWPC 



1970 



CTB 



C: 1977 CTB 
D: 1978 



1969 



RPC 



Variable CEEB 
at CLEP 
test 
centers 



♦Writing sample included 

85 

ERiC I0i) 



Appendix E (continued) 



Grade Publi- Publi- 

Level(s) cation sher 
Test Date 



College Outcome Measures Project,* College 
Composite Examination Writing Subscale 
(Subtest of an achievement battery) 

Ability to address an audience, organize 

and develop an essay and use language 

and sentence structure 

Comparative Guidance and Placement College 
Progiam Sentences Testf also referred to 
as Written English Expression Test 
(Subtest of achievement battery) 

Granmar, usage, word choice, 

sentence structure, logical 

relationships within sentences, 

clarity of expression 

Comprehensive Assessment Program High 9-12 
School Subject Tests/Writing and Adult 
Mechanics Test, Language Test 

Language Test : Spelling, 

punctuation and capitalization, 

correctness of expression 

Writing and Mechanics Test : 

Paragraph development, usage, 

paragraph structure 

Comprehensive Assessment Program - 2-8 
Spelling, Capitalization and Punctuation, 
Graamar and Language Total 
(Subscores of an achievement battery) 

Spelling, capitalization, punctuation, 

grammar 

Comprehensive Tests of Basic Skills/ 2-10 

Language (Subtest of an achievement 

battery) 

Mechanics, expression, spelling 

Comprehensive Tests of Basic Skills/ 1-12 

Language (Subtest of an achievement 

battery) 

Mechanics, expression, spelling 



1980 



ACTP 



1973 



1980 



SPC 



1980 



SPC 



Q-1968 
R-1969 



S-1973 
T-1975 



era 



era 



♦Writing sample included 



ERIC 



86 

10 i 



Appendix E (continued) 

Grade Publi- Publi- 

Level(s) cation sher 

Test Date 



Content Evaluation Series 
Language Arts Test/Language Ability, 
Composition (Subtests of achievement 
battery) 

Language Ability : Sentence structure, 
word form and function, mechanics, 
diction 

Composition : Invention, arrangement, 
and style 



7-9 



1969 



RPC 



Description Tests of Language Skills/ 

Sen tence/S tr uc tur e 

Using complete sentences; using 
coordination and subordination 
appropriately 

Descriptive Tests of Language Skill/ 

Logical Relationships 

Categorizing ideas, using appropriate 
connectives, making analogies, 
recognizing principles of organization 

Descriptive Tests of Language 
Skills/Usage 

Ability to use pronouns, modifiers, 

diction and idioms, verbs 



9-12 
College 



9-12 
College 



9-12 
College 



1977-81 ETS 



1977-81 ETS 



1977-81 ETS 



Diagnostic Skills Battery/Language 
Arts (Subtest of an achievement battery) 
Capitalization and punctuation, usage, 
sentence knowledge and composing 
process 

Educational Development Series/English 
(Subtest of achievement battery) 

Grammar, capitalization, punctuation, 

spelling 

Educational Development Series/English 
(Subtest of an achievement battery) 
Capitalization, punctuation, usage 

Essentials of English Tests 

Skills in spelling, grammatical 
usage, word usage, sentence structure, 
punctuation and capitalization 



1-8 



1977 



STS 



9-12 



1*72 



STS 



1-12 



1977 



STS 



7-12 & 13 1961 AGS 



ERLC 



87 

10 d 



Appendix E (continued) 

Grade Publi- Publi- 

Level (s) cation sher 

Test Date 



Hoyum-Sanders English Test 2-8 1964 BEM 

(Four forms) 

Division one covers sentence 

recognition, capitalization, 

punctuation, contractions, 

possessives, spelling, correct 

usage, and alphabetization. 

The second and third divisions 

cover sentence recognition, 

capitalization, punctuation, 

correct usage, and reference 

materials such as guide words and 

index 

Iowa Tests of Basic Skills, Multilevel 3-9 1978 RPC 

Battery/Language Skills (Subtest of 
achievement battery) 

Spelling, capitalization, punctuation, 

usage 

Iowa Tests of Basic Skills, Primary K-6 1979 RPC 

Battery/Language Skills (Subtest of 
an achievement battery) 

Spelling, capitalization, 

punctuation, usage 

Iowa Tests of Educational Development/ 9-12 1971 SRA 

Language Usage (Subtest of an 
achievement battery) 

Punctuation, capitalization, 

manner of expression, word and 

sentence order, organization of ideas, 

spelling 

IOX Basic Skills Tests,* 9-12 1978 IOX 

Secondary Level-Writing Subtest 

Using words correctly, checking 

mechanics, selecting correct 

sentences # expressing ideas in 

writing 



♦Writing sample included 



ERLC 



88 

103 



Appendix S (continued) 

Grade Publi-__ Publi- 

_ - - - Level (s) cation sher" 

Test Date 



Metropolitan Achievement Tests/Language* 
(Subtest of an achievement battery) 

Punctuation and capitalization, usage, 
grammar and syntax, spelling, study 
skills. Listening comprehension at 
grades 1.5 to 12.9 



1.5 to 
12.9 
H 



1978 



PSYCH 
CORP 



Metropolitan Language Instructional 
Tests 

Punctuation and capitalization, usage, 
grammar and syntax, spelling study 
skills. Listening comprehension at 
grades 1.5 to 4.9 



1.5 to 
9.9 



1978 



PSYCH 
CORP 



Minnesota High School Achievement, 
Examinations/Language Arts Subtest 
(Separate test for each grade, 
three forms per grade) 
Content Areas: 



7-12 



1976 



AGS 




Gr. 9: 



Gr. 10: 



Gr. 11: 



Gr. 12: 



Language study skills, spelling, 
word knowledge, kinds of 
sentences, usage, sentence 
structure, punctuation, 
capitalization 

Spelling, vocabulary, kinds of 
sentences, faulty expression, 
verb usage, use of words, tvpes 
of sentences i capitalization and 
punctuation, usage, general 
information, literature 
Spelling, vocabulary, sentences, 
sentence structure, punctuation, 
usage, composition, library, 
literature (interpretation), 
literature (knowledge) 
Sentence structure, word discri- 
mination, spelling, punctuation, 
diction, reading and literature, 
general information 
Sentence structure, *o?d discri- 
mination, spelling, punctuation, 
organization, library skills, 
literary style, literary figures, 
quotations, literature 
Spelling, vocabulary, punctuation, 
word discrimination, word usage, 
sentence structure, library skills, 
literature 



•Writing sample in process 



89 104 



Appendix B (continued) 



Grade Publi- Publi- 

Level (s) cation sher 
Test Date 



Missouri College English Test College 1965 HBJ 

Mechanics and Effectiveness of 
Written Expression : punctuation $ 
capitalization, grammar, spelling, 
sentence style and structure, 
paragraph organization 

National Educational Development Tests/ 7-10 1971 SRA 

English Uaage (Subtest of an achievement 

battery) 

Ability to use such basic elements of 
correct and effective writing as 
punctuation, capitalization, diction, 
sentence reconstruction, and paragraph 
organization 

Sequential Tests of Educational Progress 3.5 to 1979 AWPC 

Intermediate and Advanced; Forms D-J- * 12.9 
Writing Skills (Subtest of achievement 
battery) 

Capitalization and Punctuation : ability 
to recognize errors in mechanics or usage 
Word Structure and Osage : ability to 
detect errors in use of parts of speech 
embedded in sentences 
Sentence and Paragraph Organization : 
language construction skill and ability 
to reoognize appropriate organization 



Sequential Tests of Educational Progress 
Series II/Mechanics of Writing, English 
Expression (Subtests of an achievement 
battery) 

Mechanics > Spelling, capitalization, 
punctuation 

Expression ; Ability to evaluate the 
correctness and effectiveness of 
sentences 



4-College 1971 ETS 



Stanford Achievement Test/Spelling,** 1-9 1973 PSYCH 

Language (Subtests of achievement 

battery) 

Spelling, capitalization, punctuation, 
usage, syntax, language sensitivity, 
dictionary and other reference skills 



♦Writing sample available in 1982 

90 



Appendix E (continued) 



Grade 
Level (s) 



Test 



Publi- 
cation 
Date 



Publi- 
sher 



Test of Adolescent Language 6-12 
Ability to express thoughts in graphic 
form, ability to write, ability to 
understand and generate syntactic 
structures, abilityl to use language 
expressively 

Test of Standard Written English 9-12 
Grammar, usage, sentence logic College 

Test of Written Language* 2-8 
Vocabulary, thematic maturity, 
ability to produce meaningful thought 
units, handwriting, spelling, word 
usage, style 

Tests of Achievement and Proficiency 9-12 
(Subtest of achievement battery) 

Capitalization, punctuation, grammar, 
usage, organization, spelling 

Walton-Sanders English Test (Pour Forms) 9-13 
Ability to recognize obvious errors in 
spelling, sentence structure, punctuation, 
the use of the past tense and past 
partioiple forms of verbs, the use of 
nominative and objective forms of 
pronouns, the use of English idioms, 
especially those involving a choice 
of prepositions, and other common 
faults 

WRITES Senior High* 9-12 
Mechanics, punctuation, usage, 
vocabulary, spelling, organization 
and format 

Writing Proficiency Program* 9-13 
Sentence fragments, run-on sentences, 
subject-verb agreement, verb form, pronoun 
case, punctuation and mechanics, 
capitalization, spelling, paragraph and 
essay organization, paragraph coherence, 
topic sentences 



1980 



PRO-ED 



6/year ETS 



1978 PRO-ED 



\ 



\.9/8 RPC 

\ 



1964 BBM 



1979 CTB 



1979 CTB 



♦Writing sample included 



n 106 



Appendix E (continued) 



Grade 
Level (s) 



Test 



Publi- 
cation 
Date 



Publi- 
sher 



Writing Proficiency Program/Intermediate* 
System 

Sentence structures, sentence mechanics, 
paragraph structures, sentence fragments, 
run-on sentences, adjectives and adverbs , 
personal pronouns, verb tense, verb 
agreement with subject, misplaced 
modifiers, conjunctions, commas, end 
marks and quotation marks, . 
capitalization, topic/summary sentences, 
sequence of sentences, use of transitions 



6-9 



1981 



CTB 



♦Writing sample included 



92 



107 



APPENDIX P 
TESTS OF EVERYDAY LITERACY ACTIVITIES 



Test 


Grade 
Level (s) 


Publi- 
cation 
Date 


Publi- 
sher 


Adult Performance Level Functional Literacy 
Teat 


9 -Adult 


1978 


ACTP 


Basic Skills Assessment 

• 


8-12 


1977 




Everyday Skills Test (BDST) 


6-12 


1975 


era 


IOX Basic Skills Test 


9-12 


1978 


IOX 


Life Skills: Tests of Functional 
Can potencies in Reading and Math 


9-12 


mmmm 


RPC 


M:.rumun Essentials Test (MET) 


8-Adult 


1980 


SFC 


NM Cons uner Mathematics Test 


9-12 


1973 


NMDOE 


Reading ''Everyday Activities in Life (REAL) 


9-Adult 


1972 


CAL-P 


SKA Coping Skills: A Survey Plus Activities 


7 -Adult 


1979 


SRA 


SRA Survival Skills in Reading & Mathematics 


6-Adult 


1976 


SRA 


STS Educational Development Series: 
Scholastic Tests 


2-12 


1976 


STS 


Senior High Assessment of Reading 
Performance (SHARP) 


10-12 




CTB 


Stories About Real-Life Problems 


5-8 




NIU 


Test of Consumer Competencies 


8-12 


1976 


STS 


Test of Everyday Writing Skills (TEWS) 


9-12 


1978 


CTB 



9 

ERIC 



93 10 



APPENDIX G 



CI£RICAL TESTS INVOLVING CN -THE -J OB LITERACY SKILLS 



tr 



Tests and Subs cores 



Grade Publi- Puhli- 

Level (s) cation sher 
Date 



General Clerical Test 

Clerical speed and accuracy 
Nunerical ability 
Verbal facility 



9-Adult 



1972 



PSYCH 
CORP 



Start, Employment Tests Adult 1972 PSYCH 

* Verbal ^ 
Nunerical 
Clerical 



Short Tests of Clerical Ability Adult 1973 SRA 

Arithmetic 
Business vocabulary 
Checking 
Coding 
Directions 
Piling 
Language 



Sfc, .xerical AptituJe*. 9-Adult 1973 SRA 

Office vocabulary 
Office arithmetic 
Office checking 



ERLC 



94 10 J 



APPENDIX H 



PUB LIS HE F3 1 NAMfcS AND ADDRESSES 



ACTP 



AGS 



ATP 



American College Testing Program 

P. 0. Box 168 

Iowa City, IA 53340 

(319) 356-3711 

American Guidance Service 
Circle Pines, MN 50014 
(612) 786-4343 

Academic Therapy Publications 
20 Commercial Blvd. 
Novate* CA 94947 
(415) 883-3314 



AWPC 



BEM 



BJfA 



Addi son-Wesley Publishing Co., Inc. 
South Street 
Reading, MA 01867 

Bureau of Educational Measurement 
Emporia State University 
Emporia, KS 66801 

BFA Educational Media 
2211 Michigan Avenue 
P. 0. Box 1795 
Santa Monica, CA 90406 
(213) 829-2901 



BMP 



Bobbs-Merrill Co., Inc. 
4300 West 62nd Street 
Indianapolis, IN 46268 
(317) 298-5400 



BP 



Brador Publications, Inc. 
Livonia, NY 14487 



CAL-P 



CAL Press, Inc. 
76 Madison Avenue 
New York, NY 10016 
(212) 685-0892 



CARE 



The Center for Applied Research in Education, Inc. 
Route 59 

West Kyack, NY 10994 
(914) 358-8991 



ERLC 



95 



lit) 



Appendix H (continued) 



CEEB College Entrance Examination Board 

Box 2815 

Princeton, NJ C8S41 

CEI Cutronics Educational Institute 

128 W. 56th Street 
Bayonne, NJ 07002 

CEHPC Charles £• Merrill Publishing Company 

1300 Alum Creek Drive 
Columbus, OH 43216 
(614) 997-1221 

CSDOE California State Department of Education 

721 Capitol Mall 
Sacramento, CA 95814 
(916) 445-4688 

CTO California Testing Bureau/McGraw-Hill 

Del Monte Research Park 
Monterey, CA 93940 
(408) 649-8400 

BPS Educators Publishing Service 

75 Moulton Street 
Cambridge, MA 02138 
(617) 547-6706 

eric ERIC Document Reproduction Service 

P. 0. Box 190 
Arlington, VA 22210 

ETS Educational Testing Service 

Princeton, NJ 08541 

HBJ Harcourt, Brace, Jovanovich, Publishers 

757 Third Avenue 
New York, NY 10017 

IOX Instructional Objectives Exchange 

Box 24095 

Los Angeles, CA 90024 
(213) 474-4531 

JA Jastak Associates, Inc. 

1526 Gilpin Avenue 
Wilmington, DE 19806 
(302) 652-4990 



961J1 



Appendix H (continued) 



RHSD 



MHBC 



MFC 



NIU 



NMDOE 



Kern High School District 
2000 24th Street 
Bakersfield, CA 93301 

McGraw-Hill Book Company 
8171 Redwood Highway 
Novato, CA 94947 

NoGrath Publishing Company 
p. 0. Box 9001 
Wilmington, NC 28402 
(919) 763-3757 

Northern Illinois University 
Alan M. Voelker 
Curriculum 6 Instruction 
De Kalb, IL 60115 
(815) 753-1000 

New Mexico State Department of Education 
Monitor 

Education Building 
State Capitol 
Santa Fe, NM 87501 
(505) 827-2429 



PRO-ED 



PSAA 



PSYCH 
COIL 



PRO-ED 

333 Perry Brooks Building 
Austin, TX 78701 

Paul S. Amidor^* Associates, Inc. 
1966 Benson Avenue 
St. Paul, MN 55116 

The Psychological Corporation 
Harcourt, Brace, Jovanovich, Publishers 
757 Third Avenue 
New York, NY 10017 
(217* 888-3500 



RLZA 



Richard Zweig, Associates, 
20800 Beach- Blvd. 
Huntington Beach, CA 92648 
(714) 536-8877 



Inc. 



RPC 



Riverside Publishing Company 
1919 South Highland Avenue 
Lombard, IL 60148 
(312) 629-9700 



ERLC 



o 

97 



lid 



Appendix H (continued) 



SC 



SFC 



SIUP 



Stoelting Company 
1350 S. Kostner 
Chicago, IL 60623 
(312) 522-4500 

Scott, Foresman and company 
1900 Bast Lake Avenue 
Glenview, IL 60025 
(312) 729-3000 

Southern Illinois University Press 
P. 0. Box 3697 
Carbondale, IL 62901 



SOI 



SRA 



SRRC 



STS 



USDOL 



WPS 



SOI Institute 
214 Main Street 

El Segundo, CA 90245 j 
(213) 322-5995 

Science Research Associates, inc. 
155 N. Wicker Drive 
Chicago, IL 60606 
(800) 621-0664 

Southwest Regional Resource Center 
127 S. Franklin Street 
Juneau, AK 99801 
(907) 586-6806 

Scholastic Testing Service 
480 Meyer Road 
Bensenville, IL 60106 
(312) 766-7150 

United States Department of Labor 

Bureau c* Labor Statistics 

1515 Broadway 

Hew York, NY 10036 

(212) 399-5405 

Western Psychological Services 
12031 Wilshire Blvd. 
Los Angeles, CA 90025 



ERLC 



98 



113 



